final report - heating and cooling

Hunt Library Heating & Cooling 12745: AIS Project, Final Report

Aditya Chaganti, Alex Shu, Joseph Dryer, Yutong Guo 4/6/16

Executive Summary

A solution is only valuable if there was a problem to begin with. This may sound exceedingly simple, yet

many start-up companies and other businesses fail because they address a problem or market that simply

doesn’t exist. Our project has its origins in a complaint here and there about the indoor conditions at Hunt

Library. Our professor mentioned that some employees who work there keep heaters under their desks.

Some team members who study there started to recall being too hot or too cold often. User interviews

revealed it wasn’t just this – throughout the building, various employees were happy to explain to us the

thermal imbalance they experience and its negative effects. Thus, out of a variety of different problems

that we could solve in a semester, we decided Hunt Library was the choice – a real problem that affected

real people.

The heating and cooling of a building is complex, governed by a variety of natural phenomena like

outdoor temperature, wind speed, irradiation, as well as indoor factors such as occupancy, lighting,

computers, etc. Mechanical systems must add or remove heat from the building to balance these forces

and produce a comfortable environment. Our project doesn’t solve the problem in Hunt Library – we do

not suggest that valve C on air handling unit #1 be turned 15 degrees clockwise. Rather, we have produced

a method for assessing the problem and providing the information needed to mitigate it.

We designed and built a scalable sensor network that measures temperature and humidity at any

desired resolution via mobile, battery powered sensor nodes. We produced a data management system

that collects these measurements and stores them for later use. We created an analysis framework that

leverages our network measurements as well as mechanical systems data to identify driving factors for

internal conditions and also predicts them. Communication of results is achieved through our visualization

design – although not implemented, we have conceptually designed an application that will deliver real

time and historical data of indoor conditions and data analysis outcomes through intelligent graphics.

Through our system, a building manager can make data driven decisions to keep occupants

comfortable and increase energy efficiency. An office worker can predict indoor conditions and come to

work dressed appropriately.

While functional, we recognize that our system is a prototype. We suggest a future route that

would ensure a reliable, scalable, and functional system. This report documents not only our final design,

but our decision frameworks and survey of technical options along the way.

1

Contents

1 Introduction ............................................................................................................................................................................................ 2

1.1 Problem Setting ............................................................................................................................................................................. 2

1.2 Heating and Cooling Theory .......................................................................................................................................................... 2

1.3 Visionary Scenario ......................................................................................................................................................................... 3

1.4 Proposed Solution and Project Definition ..................................................................................................................................... 4

2 Technology Survey & Decision Framework ............................................................................................................................................. 4

2.1 Sensor Networks ........................................................................................................................................................................... 4

2.2 Sensors for Temperature and Humidity ........................................................................................................................................ 7

2.3 Decision Framework and Final Selection ....................................................................................................................................... 8

3 System Design ....................................................................................................................................................................................... 10

3.1 Sensor Node Design .................................................................................................................................................................... 11

3.1.1 Hardware ........................................................................................................................................................................... 11

3.1.2 Power Source Design ......................................................................................................................................................... 13

3.1.3 Software ............................................................................................................................................................................ 15

3.2 Data-logger Design ...................................................................................................................................................................... 16

3.2.1 Machine Selection ............................................................................................................................................................. 16

3.2.2 Software ............................................................................................................................................................................ 17

3.3 Design Limitations and Potential Improvements ........................................................................................................................ 18

4 Data Collection Results & Graphical Exploration .................................................................................................................................. 19

5 Modeling for Prediction and Inference ................................................................................................................................................. 24

5.1 Training Data ............................................................................................................................................................................... 24

5.2 Prediction Models ....................................................................................................................................................................... 25

5.2.1 Artificial Neural Network ................................................................................................................................................... 26

5.2.2 Linear Regression ............................................................................................................................................................... 30

5.3 Causal Inference .......................................................................................................................................................................... 34

6 Interactive Data Analysis and Visualization Platform ............................................................................................................................ 35

7 Conclusion ............................................................................................................................................................................................. 38

8 References ............................................................................................................................................................................................ 38

9 Appendix ................................................................................................................................................................................................. 0

9.1 Sensor Node Software: Photon C++ Script .................................................................................................................................... 0

9.2 Raspberry Pi: Logger Python Script ............................................................................................................................................... 1

9.3 Raspberry Pi Dropbox Backup Python Script ................................................................................................................................. 4

9.4 Group Account Information for Gmail, Dropbox, Particle Build .................................................................................................... 4

9.5 Ipython Notebook for Plotting Data .............................................................................................................................................. 4

9.6 Data Analysis Ipython Notebook ................................................................................................................................................... 4

2

1 Introduction

1.1 Problem Setting Hunt library has a heating and cooling system that fails to keep occupants comfortable. An attendant at the arts

information and reference desk on the fourth floor claims it’s too cold in the winter and too hot in the summer.

She keeps a space heater under her desk to mitigate the problem. The special collections and design librarian on

that same floor, who has been working in the building for over 30 years, says the problem stems from the entire

envelope of the building being made from aluminum and glass. The fourth floor has the most issues, she claims.

She mentioned hot and cold air pockets and issues unique to each side of the building. Her colleagues on the

west side of the building, she says, keep three blankets in their office as well as space heaters. They dress in

layers, and keep fans ready for the summer time. The computer services manager used to work on the second

floor. He was offered a larger office on the fourth floor. He knew there were HVAC issues, but moved there

anyway. Ever since, it’s been a battle. Sometimes he opens his windows. He says it’s “unstable”. Sometimes it’s

75 degrees outside but freezing inside. His thermostat is set to 80 degrees, hoping to gain another degree or two

from the current 68. The architecture librarian on the fourth floor claims that the air in the winter is dry. He says

some people bring in humidifiers. Coming out of the more controlled special collections / rare books room you

can feel the humidity change. On the third floor, the head of interlibrary loan mentioned the heating and cooling

doesn’t respond to what’s happening outside. A warm week in the winter means it’s going to be really hot

inside. In the lowest floor (basement), at a service desk, the attendant says it’s always too cold in the winter. He

keeps a large space heater on the floor to warm the room.

Figure 1: Space heater in the service desk on the bottom floor.

The problem of heating and cooling in Hunt library has been well established and is well known by various

faculty and staff around the Carnegie Mellon Campus. Some blame the building construction. Others blame the

HVAC system design, or its controls. If you walk through the library and look at the variety of different

thermostat settings, you might blame that. Either way, there is a point of pain - both for occupant comfort as

well as budgetary.

1.2 Heating and Cooling Theory The awry indoor conditions in Hunt Library are, like for any building, a function of several factors – both internal

and external. The essence of the physical phenomena dictating internal temperature and humidity is an energy

balance. The most important factors include outdoor temperature and humidity. Solar irradiation during the day

can also have an impact as the envelope of the building heats up and conducts into the interior. Outdoor wind

speed can induce convective heat transfer. Internal factors, such as heat generation from people, computer, and

3

lights, also have an impact. Finally, managing all of this are the mechanical systems. They add or remove heat

from the system, while also performing important functions such as providing fresh, clean air. The figure below

illustrates these interactions.

Figure 2: Thermal balance of a building.

Most air conditioned buildings are kept somewhere around 70 F. This can be slightly lower in the winter or

higher in the summer, but large deviations from this figure will result in occupant discomfort. If large variations

in temperature are present in a building – e.g. if one floor has in an open area a wide range of temperatures as a

result of poor HVAC performance – inefficiency results.

This basic theory of heating and cooling should be kept in mind in the rest of the sections. Data analysis

considered the factors discussed here carefully in designing a framework. Results for temperature and humidity

measurements should be considered in the context of this theory.

1.3 Visionary Scenario Having introduced the problem, we present our vision for two stakeholders involved.

The Facilities Manager

The facilities manager comes into his office and opens up his computer. He has access to data that would help

him make decisions to mitigate energy loss stemming from inefficiencies in Hunt. The data shows him a color

map of current temperature, humidity, and occupancy for each floor of the library. He is interested in looking at

average values over two weeks, so views that. Then he is interested in values as they vary over a day, and views

that. He feels empowered with this information. He can utilize this combined with HVAC operation and energy

consumption data to make informed decisions for optimization and maintenance.

The Office Worker

The problem: Jill walks into her office in the basement of Hunt library. She takes her coat off and leaves it on the

back of her chair. She’ll need it in a few minutes when she gets cold. She turns on her space heater which helps

her stay comfortable. She has a meeting before lunch. She heads up to the fourth floor and sits down around

4

the conference table. The meeting starts, but all of a sudden she’s feeling very warm. Indeed, she forgot she was

wearing 2 sweatshirts - she usually sheds them before heading upstairs. One day, she hears from another co-

worker that a new data driven approach has enabled progress on the heating and cooling problem in the library.

A few weeks later after noticing marked improvement in indoor conditions at work, she scraps the space heater,

eases through her morning routine without worrying about packing along extra sweaters, and finds herself more

focused and productive throughout the day.

1.4 Proposed Solution and Project Definition With the problem defined and the vision for resolution, we are ready to propose the components of our project

that will comprise an integrated solution.

The first goal of this project is to quantitatively assess indoor conditions – namely, temperature and humidity.

The spatial (variation by location in the building) and temporal (variation over time) profiles for these variables

are to be determined. Data exists for certain zone temperatures from the IBM Smart Campus project, but the

majority of spaces in the building do not have space temperature or humidity data (aside from mechanical

thermostats). Thus, a sensor network which measures temperature and humidity is to be developed. The

network will allow data collection at a fine resolution (6+ measurements per floor) so as to capture the profiles

we seek.

The second goal of this project is to identify important factors driving indoor conditions and predict those

conditions. The IBM Smart Campus project can provide data on the mechanical systems (supply and return air

temperatures, valve positions, flow rates, etc.). Outdoor measurements of temperature, humidity, solar

irradiation, and wind speed are also available. This information can be incorporated into a data analysis

framework to achieve our goal.

The third goal of this project is to visualize the results. We envision an application that allows a user to observe

in real time the indoor conditions through intelligent graphics, look up and download historical data, receive the

outputs from data analysis, and predict future conditions. This application will allow the building manager to

make data driven decisions to correct problems related to indoor conditions as well as provide building

occupants advanced knowledge of indoor conditions so they can dress or prepare accordingly.

2 Technology Survey & Decision Framework A variety of different technologies are available for implementing a sensor network to measure temperature and

humidity. A comprehensive review of these is included in this section. We consider technical performance, cost,

feasibility, and learning objectives in making a final selection to use for our system.

2.1 Sensor Networks Sensor networks allow for monitoring of physical conditions (such as temperature, humidity, or sound level) in

different locations and to pass the resulting data to a central point.

There are a variety of different types of sensor networks that are commonly used. Some include Wifi, Bluetooth,

and Radio Frequency communication methods. The main differences among these technologies include data

transmission rate, power consumption, required components, and cost. In this section we explore some of the

different technologies available. Solutions range from complete integrated products to “do-it-yourself”.

Wifi Based Sensor Networks

5

With wireless internet available almost anywhere this day and age, the concept of a sensing device that

connects to the wireless network is attractive. Such a solution framework is elegant because the ‘receiver’ in this

sensor network already exists - wireless routers. All a device has to do is send data through a wifi connection,

and this data can be stored in the cloud and accessed as needed through a browser or otherwise obtained

through download.

An integrated product that exists on the market is Wireless Sensor Tags. These consist of small, thin squares

(tags) which are sensors for temperature, humidity, motion, and more. The ideal tag for temperature and

humidity measurement is a 13 bit tag with +/- 2 % relative humidity and +/- 0.4 C temperature accuracy at a cost

of $29 each. An ethernet tag manager is also required and has a cost of $49. There does not appear to be bulk

pricing [1].

A more hands-on Wifi solution would be the Photon. This piece of hardware is designed to be an Internet of

Things enabler. Basically, it serves as a microcontroller with supporting components to be directly wired into a

sensor circuit, and is equipped with a Wifi chip to broadcast data. The Broadcom Wifi chip - the Po Wifi module -

is found in the Nest, LIFX, and Amazon Dash. The microcontroller is an STM32 ARM Cortex M3. It’s

programmable using Wiring which is also used by the Arduino. Code can be written on the web or in a local IDE

and sent wirelessly to the Photon - updates are possible remotely. It’s open source software. The power

requirements are 3.6-5.5 VDC or a USB power source [2]. Advantages to this type of network to the one above is

that an ethernet module is not required - the chips simply connect to the internet and put your data in the cloud

(a server which you get access to by Particle IO). The photon costs $19 per [3] and does not change in bulk.

While the Photon would be appropriate for prototyping with a network on the order of 10 nodes, larger scale

networks might be better constructed using the TI CC3200 SimpleLink - a Wi-Fi enabled MCU (microcontroller).

An integrated development circuit board including various types of sensors (a “plug it in and go” solution) - the

LaunchPad - allows for prototyping but is three times as expensive as just the MCU [4]. However,

implementation on the order of tens or hundreds of nodes would require integrating the SimpleLink into a

custom circuit board with the appropriate sensors and components. More specifically, the SimpleLInk has an

ARM Cortex-M4 microcontroller. The wifi chip is compatible with WPA2 Personal and Enterprise Security. The

power supply from batteries should be in the 2.1 to 3.6 V range. It has low-power modes such as hibernate and

low-power deep sleep. There are guides for application and programming, which can be done through the

Uniflash Standalone Flash Tool for TI Microcontrollers. One major advantage for large scale implementation

would be the fact that cost decreases with quantity ordered: 1-9 for $16.8, 10-24 for $14.78, 25-99 for $13.58,

and 100 for $12.62 [5] [6].

Radio Frequency Based Sensor Networks

ZigBee is a standards-based technology that is usually less expensive than Wifi and Bluetooth technology. It can work on a low duty cycle, allowing for sleep and rapid wake up which enables long battery life. A normal small coin-cell battery could provide one node with multi-year operation [7]. Compared to Bluetooth and Wifi, it has a relatively low information transmission rate (20~250 kbps). Additionally, ZigBee has a mesh network mechanism whereby nodes within the system are interconnected and self-recoverable so that if a node fails, other nodes can reconfigure for alternative routing paths for the new network structure [8]. One of the Zigbee integrated solutions is the TI CC2538 System-On-Chip (SoC). It has high processing power with the ARM Cortex-M3 application processor and sensor controller engine (SCE). It has support for Zigbee and IEEE802.15.4 mesh networking, which enables a 10-meter communications range and a transfer rate of 250 kbit/s for each node [9]. According to the power calculator provided by TI official website, a product designed with the Zigbee CC2538 (as the Zigbee end

6

device) operating at 3.0 volts with a typical 230 mAh coin cell could work nearly two years with 144 data transmissions per day (ie. one transmission per 10 minutes) [10]. The unit price for this SoC is $7.65 [11]. Note that this SoC is a chip that is to be integrated into a circuit – it’s meant for large scale application. Prototype level zigbee modules with leads that could be interfaced with a breadboard circuit and microcontroller are also available at a cost of $19 [12]. Another option for setting up an RF sensor network is to employ Arduinos with RF transmitters as sensor nodes and an Arduino/Raspberry Pi as a receiver hub as done by Ben Miller in an electronics article called “Building a Wireless Sensor Network in Your Home” [13]. Basically, the sensor nodes consist each of sensors, an arduino for signal processing, and an RF transmitter and antenna. The receiver consists of an antenna, receiver, arduino, logic convert, and Raspberry-Pi. Arduino is an open-source microcontroller widely used for prototype electronic projects. Arduino boards allow for reading digital or analogue signals such as those from a temperature sensor and producing outputs. One can send instructions (via the Arduino programming script which can be developed in the Arduino development environment (IDE)) to the microcontroller unit (MCU) on board for processing of inputs and production of desired outputs. The most economical option for Arduino would be Uno. It’s based on the ATmega328p MCU, which has 14 digital I/O pins, 6 analog inputs, and one USB port. This board could be powered via a USB connection or typical external power from an AC-to-DC adapter [14]. One of the most important advantages for Arduino board is that it could be conveniently programmed to achieve all kinds of functions incorporated with sensors, transmitters, receivers, and microcomputers. A Raspberry pi is actually a credit card-sized single board computer that is widely used for dealing with all kinds of processing work. After deriving the data from the Arduino board, a Raspberry Pi could compile the data into our expected format, and provide access for cloud storage or direct afterwards analyzing. One Arduino board costs $24.95 [15]. The total cost for this network, however, would include transmitters, receivers, antenna, sensor components, and a Raspberry Pi ($35.5) [16]. It’s estimated that the cost of 1 transmitter and 1 receiver could be over $100 [13].

Table 1: Comparison of sensor network technologies.

Sensor networks

Communication Technology

Cost Information Advantages Disadvantages

Wireless Sensor

Tags Wifi

$49 (receiver)+$29 per tag

Fully implemented sensor network solution.

No development needed (not valid

for a class)

Photon Wifi $19 per tag

Reprogrammable, digital/analogue, low

cost, feasible for prototyping.

Poor economies of scale.

TI chip Wifi $17-$12, depending

on quantity

Very low cost, ideal for large scale, low energy

consumption

Requires integrating into

circuit

Zigbee RF

Prototype Level – $19/ transceiver (no

MCU) Integrated SoC for

large scale $7.65/unit

Very low energy consumption, long range

communication.

Low information transmission rate

Arduino RF Over $100/unit Reprogrammable High cost

7

2.2 Sensors for Temperature and Humidity This section introduces options for sensors that measure temperature and humidity. It presents the physical

phenomena / mechanism of sensing, the specifications (accuracy, precision, etc.), and the cost for a variety of

different alternatives.

Temperature Sensors:

Temperature sensors operate on two primary principles: voltage due to a difference in temperature, and change in resistance from change in temperature [17].

Thermocouples are made from two different metals which join at a point. They produce a voltage due to the temperature difference between the two ends [18]. Thermocouples are the most commonly used temperature sensors, and are used where cost, simplicity and wide operating temperature range are vital, but high accuracy is not. Further, thermocouples are also susceptible to noisy data reading given the low voltage involved, and fare worse in terms of linearity and accuracy than the RTD. The price range of commercially available thermocouples on the internet is $4-$12. An example of a popular thermocouple sensor is the uxcell K Type -50-700C Thermocouple Probe Temperature Sensor [19].

Semiconductor sensors incorporate solid state devices like diodes or voltage references, and work on the same operating principle as thermocouple sensors, i.e, they produce a voltage proportional to the difference in the temperature between the transistors, both of which operate at different stable collector current densities.

Semiconductors are popular for use in embedded systems, given their delicate nature [20]. Types of semiconductor sensors include voltage output, digital output, resistance output, and diode. Semiconductor thermal sensors have accuracies similar to thermocouples, unless they are calibrated, in which case they exhibit improved accuracy. (+/- 0.75 degrees C to +/- 4 degrees C) They are also very small, and are a low cost alternative. The major drawbacks when considering semiconductors for use in the project is their delicate nature, and poor interchangeability.

Resistance Temperature Detectors (RTD) and thermistors work on the same operating principle. A change in temperature causes metals to expand/contract, thus causing a change in resistance which can be mapped to a change in temperature. RTD classes A and B, consist of 100 Ohm Platinum wire wound around a cylinder. The data logger inputs a known current, and measures voltage to determine resistance. Then, the resistance is mapped against the temperature resistance curve for the material used, to get the temperature. RTD sensors can also use copper or nickel. RTDs are more accurate than thermocouples, and are extremely stable over long periods of time. (Common RTD drifts are around 0.1 degrees centigrade/year) They also come with high standards for repeatability, and have a very low response time. Like semiconductors, RTDs have limited temperature range by broader industrial standards. However, this range (Typically -70-260 Degrees Centigrade) is still encompasses the range required for the project. The price range of commercially available RTD sensors is $10-$34 [21] [22]. Examples include the AGPtek Stainless Steel PT100 RTD Thermistor Sensor Probe [23] and the Liquid tight RTD sensor, 34 mm probe, 1/8 NPT Thread [24].

Thermistors work on the same principle as RTDs, and have resistances proportional to the ambient temperature. They have a operating range that satisfies the requirement for use in this project (-70 degrees C to 150 degrees C), and have a very high accuracy. (+-0.2 degrees C) Thermistors also offer a resistance change that is very non linear, resulting in very high accuracy (0.01 degree C), but are still not accurate as RTDs. Typically, they are cheaper than RTDs. Examples of thermistors include the Caldera Spa Temperature Sensor Thermistor Ewgx272

- 71578 ($32.02) [25] and 10K Thermistor Temperature Sensor | Miniature Stainless Steel ($10) [26].

Humidity Sensors

8

Humidity sensors are predominantly divided across resistive, capacitive, and thermal conductivity sensing technologies.

Capacitive humidity sensors operate on the principle that their dielectric constant is proportional to the relative humidity of the air around the sensor. These sensors have low temperature coefficients, and can function in very high temperatures. They are also produced in high-efficiency semiconductor manufacturing plants, and give rise to minimal drift over time. Typically, capacitive sensors give errors of ±2% RH with calibration. These sensors are most commonly used for applications in the industrial, commercial, and weather telemetry spaces [27].

Resistive Humidity Sensors work on the principle that hygroscopic mediums have an inverse relationship

for impedance (Normally exponential) with the relative humidity. Resistive humidity sensors have a number of

advantages: They are interchangeable/field replaceable within ±2% RH. These sensors are also small, cost-

effective, and have very high levels of stability over long periods of time. A significant downside of using these

sensors, however, is their susceptibility to chemical vapors and oil mist. They also give lower accuracy readings

on exposure to condensation if they use a water soluble coating. This, however, should not be a problem within

the scope of our project, given that the setting is on the inside of Hunt Library. Long term stability, low cost, size,

and interchangeability make these sensors ideal for industrial, commercial and residential applications [27].

Thermal conductivity based sensors are durable, and extremely resistant to chemical vapors. This is due to the inert materials like glass/semiconductor material that these sensors are used to make these sensors, thus also giving them a very high operating temperature range. These sensors are used in situations where the operating temperature is in excess of 200 F, and have an accuracy of ±5% RH at 40°C and ±0.5% RH at 100°C. Given the operating temperature range, applications include use in kilns, in machinery for drying textiles, cooking, and

pharmaceutical production, and thus, would not be suitable for use in the project [27]. Table 2: Summary of sensor types and performance.

Sensor Category Sensor Type Advantages Disadvantages

Temperature

Thermocouples Low cost, easy to

operate Low accuracy

Semiconductors High accuracy Unstable, need repairment or

replacement frequently

Resistance Temperature

Detectors (RTD)

High accuracy, high repeatability, stale

Limit range

Humidity

Capacitive Humidity Sensors

High accuracy, stable, able to recover from

condensation Large size, high cost

Resistive Humidity Sensors

High accuracy, low cost, small size

unable to recover from condensation

Thermal Conductivity

Humidity Sensor

Stable, resistance to physical and chemical

contaminants Low accuracy, high cost

2.3 Decision Framework and Final Selection The first step in choosing a technology was to filter out those which do not meet technical requirements of the

system. For the sensor network, the technical requirements are:

9

Ability to interface with analogue and digital sensors

Ability to be powered by either battery or plug point

Communication range appropriate for a large academic building

Scalability from small (2-4) to medium-large number (10-20) of nodes

For the sensors, the technical requirements are:

Accuracy +/- 0.5 C, 5% RH

Range 0-100 C, 0-100 % RH

The optimal choice will meet these technical requirements, and also score highest when considering criteria of

cost, feasibility, and learning objectives.

From the previous sections, we see that five communication technologies are available, which are the Photon, Wireless Sensor tags, the TI chip, the Arduino, and the Zigbee. In terms of cost, a good comparison is that of 4 nodes, because we’ll just take our measurement on one floor and the demand is not so large. As is shown in the following table, 4 nodes of Wireless Sensor Tags cost the least, followed by Photon, while Arduinos cost most. Notably, The cost calculation is an estimated one, including the combination of technologies, sensors and power sources.

Table 3: Costing Out Sensor Network Technologies

Technology Cost for 4 nodes Cost for 10 nodes Cost for 100 nodes

Wifi – Photon [28] $179.20 $431.50 $4,216.00

Wifi - Wireless Sensor Tags [29] $165.00 $339.00 $2,949.00

Wifi - TI Chip [30] $192.65 $467.57 $4,609.36

RF – Arduino [31] $250.19 $489.89 $4,085.00

RF – Zigbee [32] $181.40 $453.50 $4,290.00

In terms of feasibility, we compare how well each product suit our project. The most suitable communication tech seems to be WiFi, as it is everywhere in Hunt Library and we could get access and use it easily. On the other hand, RF-based products, like the Zigbee and the Arduino, require an additional ethernet, which most increase our difficulty and financial budget. Another requirement of our project is the product should be programmable, as we will write our whole code to fit for the specific environment. In this term, WST ranks the last, because it does not have an open source. Compared to WES, the Photon is a good product, allowing the simplest way to program it. Moreover, we also need to consider how to power the network. We could either power it by USB or battery, but in our case, battery seems to be better, as there's not enough USB chargers in Hunt. In this term, RF-based techs, though consume less energy, is not our best choice, as they all require a DC-to-AC convertor. The last concern is whether the product has less restrictions on choosing sensors. A good network will be the one which has both digital and analog input/output, and with a wide range of power output. In terms of this, Photon outcompete all the other products

As is mentioned before, WST performance relatively well in cost calculation, but it is still not a good option in our case. That is because WST is a whole system product, ranking the last in learning objectives, and gives us the least opportunity to study communication mechanisms. On the other hand, the second best option, the Photon, is an ideal product. As a tiny Wi-Fi development kit for prototyping and scaling, Photon is reprogrammable and can be connected to the cloud, allowing more learning objectives.

To reach a final decision, each factor( cost, feasibility, learning objectives) is to be assigned a weight to enable a quantitative comparison across the options for sensor network technology. In the end, the product

10

with the highest score is our final choice. Among all the factor, learning objectives is less important, so the weight is 2, while the other two factors each weigh 3.

Figure 3: Comparison of sensor network technologies for final decision.

From the bar chart above, we could find that the Photon has the highest score, and suits our project best.

As for sensors, several factors are important - cost, power voltage, accuracy and prototype feasibility. Specific to our case, another problem is also worth considering - whether to buy one temperature sensor and one humidity sensor separately, or buy a combined T/H sensor instead.

In terms of cost, the combined T/H sensor has more advantages over two separate sensors. For example, a typical thermal sensor (RTD Platinum-Clad NI Wire) costs $13.99 [38], and a typical humidity sensor (Sunkee AMT1001) costs $8.96 [39], so the total cost is about $22.95. While on the other hand, a combined T/H sensor, like RTH03, only cost $9.95 [40].

Based on the voltage produced by the Photon, two sensors are suitable - RTH03 and HDC 1080[41]. However, HDC 1080 is only a tiny chip and needs more components before connecting to the circuit. On the other, RTH03, with a power range from 3.3 to 6V, and an accuracy of 0.5C(T)/2%(H), meets all of our requirements, and can be easily placed into circuit. It is a digital sensor and thus more easily interfaces with the Photon as well as allows for easier extraction of the temperature and humidity. It is also factory calibrated. Considering all of this, the RTH03 combined sensor becomes our final choice.

3 System Design As discussed in the previous section, the Particle Photon was selected as the wifi-enabled microcontroller unit to

be utilized in the sensor nodes. The choice of the Particle Photon impacts the flow of data after collection. A

convenient feature of the Particle Photon is that data can be sent to the Particle cloud – a server hosted by the

parent company Particle – is that data can be stored temporarily before being fetched by users. Having

introduced this, we can now present the system schematic which summarizes how information flows from the

sensors to the final steps of data analysis and visualization:

11

Figure 4: System diagram.

Temperature and humidity are measured in the sensor nodes. They send these measurements to the Particle

Cloud server. A data logger fetches the measurements from the server and saves them so they can be used in

data analysis and visualization. These latter two components enable the user to make data driven decisions.

The current implemented system contains six sensor nodes (the three shown in the figure are just for

demonstration purposes). The subsequent subsections will explain the hardware and software design of these

nodes. Later subsections discuss the data management system.

3.1 Sensor Node Design The sensor nodes are the physical units that are to be placed to take measurements. Ideally, they are mobile so

they can be placed and moved as needed. They should be able to take measurements reliably and pass the

resulting on to a central point.

3.1.1 Hardware The circuit design for the sensor nodes appears below:

12

Figure 5: Circuit diagram

The white piece is the digital temperature and humidity sensor, chosen for it’s reasonable cost (~$10) and high

accuracy (+/- 0.5 C for temperature , +/- 2% for relative humidity) [33]. It has power into row 3 (it’s first pin, red

wire). A pull up resistor (1000 K) runs from its second pin in row 4 to power. Finally, its fourth pin (row 6, black)

is grounded. A 100 nF capacitor (green) is added between power and ground for smoothing. Its second pin (row

4) is connected to digital input D5 on the photon (row 17, purple wire). Power is supplied from the 3V3 volt pin

(row 22, red). The photon is powered through via battery. The input range for Vin on the photon is 3.6 to 5.5

volts. Four AA batteries in series produce a voltage of 6 V, which is then dropped to 4 V through a linear

regulator – the MCP1702 (black part in the middle of the circuit board). This allows for voltage decline of the

batteries (this will be discussed further below).

The data sheet for the sensor specifies that the device is factory calibrated and ready to use. For the first two

sensor nodes, studies of accuracy and precision on readings for the temperature were conducted. A high

precision temperature sensor was set in the CEE classroom and allowed to come to a steady state reading for

one half of an hour. The temperature compared to the reading from the digital sensor within 0.5 C. The

precision of the sensors is also validated. When placed adjacent to one another, two sensors register readings

within 0.5 C and 2% relative humidity of each other. The relative humidity was not validated, as a probe could

not be accessed for this. It does, however, respond to clear changes in humidity (for example, if a human finger

touches the sensor). The other four sensors were assumed to be accurate.

The following is a Bill of Materials reflecting the components necessary for each node, to be scaled to the

number of nodes to be implemented. The parts are linked to a source for purchase.

13

Table 4: Bill of Materials, Per node

Part Quantity Unit Cost Cost

Particle Photon 1 $19 $19.00

1 kOhm resistor 1 $0.19 $0.19

1 nF capacitor 1 $1.10 $1.10

4xAA battery case 1 $1.60 $1.60

1 uF capacitor 2 $0.05 $0.10

LDO MCP1702 1 $0.48 $0.48

RHT03 sensor 1 $9.50 $9.50

Half breadboard 1 $2.67 $2.67

Breadboard wire ~10 $0.01 $0.10

Jumper cables to sensor 6 $0.07 $0.42

Total $35.16

The total cost of $35.16 is nearly $10 less than the estimated cost listed in section 2.3. It is also less than the

Wireless tags technology mentioned during the technology survey. The battery design which enables reliable

power at a small cost is major factor in driving down the price. The cost comparison of different powering

options will be discussed in the subsection below as well.

3.1.2 Power Source Design To power the photon, there were two options: USB power (wired) and battery power (wireless). USB power

involves a 5V, 1A “wall wart” charger that would need to have either a USB extension or power extension cord

to reach the location of the node. It’s clear that there are disadvantages to USB power – extension cords will

need to run from the plug point to the node, which may be obtrusive or even present safety hazards. USB would

have the following cost components: Wall wart: $3.25 [34] , USB Extension (10 ft): $5.80 [35] , Micro USB (3 ft):

$2.25 [36]. Thus - a total cost of: $11.30 per node. It could be higher if lengths of more than 10 ft for cables are

required. The cost to tape down cables in certain areas, or run them above would also be a component.

Batteries, on the other hand, would allow nodes to be placed in the desired location without consideration of

plug points or obtrusive wire runs. With this improvement in mobility and safety, the cost is the only unknown

factor. To determine cost, a battery design solution is required.

Let us first define requirements for a battery power design solution. The measured current consumption during

Wifi connection is about 90 mA, while it is nearly zero (160 uA) during sleep mode. Thus, the battery needs to

supply up to 100 mA of current, and meet the voltage input requirement for the photon (3.6 to 5.5 V). It should

be low cost, not consume much space, have a long life (2 weeks +), and provide positive and negative leads to be

plugged into a breadboard. While preexisting battery solutions such as the Photon Battery Shield paired with a

LIPO battery [37] exist, their cost exceeds that of the USB power solution ($13+). A low cost battery solution can

leverage AA alkaline batteries in series. Each AA battery provides 1.5 V, and 4 in series provides 6 V. A linear

regulator can be used to output a constant 4V. The regulator consumes about 2 uA of current (negligible) and

has a dropout voltage of about 0.6 V (the voltage change across the regulator itself) [38]. To be conservative,

let’s assume the photon should never receive less than 4 volts. With the 0.6 V dropout, the battery should never

output less than 4.6 V. Four, as opposed to three, batteries are in series since voltage drops over the lifetime of

the battery – three batteries will hit a voltage lower than 4.6V. The figure below shows voltage drop over the

lifetime of a 1.5 V AA alkaline battery. Note that the capacity in Ah (amp-hours) changes with discharge rate as

well as the voltage drop.

https://store.particle.io/

http://www.amazon.com/E-Projects-25EP5141K00-Ohm-Resistors-Pack/dp/B0185FC5IG/ref=sr_1_1?ie=UTF8&qid=1459668680&sr=8-1&keywords=1+kohm+resistor

http://www.amazon.com/Ceramic-Disc-Capacitors-2-3KV-pieces/dp/B00LVLD6LM/ref=sr_1_4?ie=UTF8&qid=1459668723&sr=8-4&keywords=1+nF+capacitor

http://www.amazon.com/Cell-Battery-Holder-Wired-Switch/dp/B01AT17N9K/ref=sr_1_5?ie=UTF8&qid=1459668840&sr=8-5&keywords=4xAA+battery+case+6V

http://www.amazon.com/uxcell-Radial-Aluminum-Electrolytic-Capacitors/dp/B005PKPV3G/ref=sr_1_1?ie=UTF8&qid=1459668877&sr=8-1&keywords=1+uF+capacitor

http://www.digikey.com/product-detail/en/microchip-technology/MCP1702-3302E%2FTO/MCP1702-3302E%2FTO-ND/1098463

http://www.amazon.com/SMAKN%C2%AE-Digital-Temperature-Humidity-Measurement/dp/B00MIBRFTI/ref=sr_1_2?ie=UTF8&qid=1459668945&sr=8-2&keywords=RHT03

http://www.amazon.com/Frentaly%C2%AE-Solderless-BreadBoard-tie-points-power/dp/B01258UZMC/ref=sr_1_3?ie=UTF8&qid=1459668961&sr=8-3&keywords=half+breadboard

http://www.amazon.com/Pre-formed-140-piece-Jumper-Wire-Kit/dp/B014JOV4TI/ref=sr_1_4?ie=UTF8&qid=1459669028&sr=8-4&keywords=breadboard+wire

http://www.amazon.com/Solderless-Flexible-Breadboard-Jumper-Wires/dp/B005TZJ0AM/ref=sr_1_3?ie=UTF8&qid=1459669084&sr=8-3&keywords=breadboard+wire

14

Figure 6: Voltage and capacity changes with discharge rate and usage. Image source: [39]

The method of Amp-Hours is used to find an estimate of the life of the batteries for our application. Thus, we

need to know the discharge rate and the capacity of the battery. The typical number of mAh in an Alkaline Long-

Life (Duracell Coppertop) AA battery is about 2100-2200 mAh, when discharged at 100 mA [40]. As the discharge

rate drops, the life of the battery increases, and the rate of voltage loss also decreases, enabling an even voltage

until very close to full discharge (see above figure). An energizer AA battery, discharged at a rate of 25 mA is

supposed to have about 3000 mAh [41]. Thus, depending on the battery brand, capacity can vary.

For this case, the discharge rate can be found by an average. The amount of time it takes to take a measurement

is about 10 seconds. There is a period of connecting to wifi, measurement, and then a five second buffer to

make sure enough time to send is given. Thus, the average discharge rate is:

10 𝑠

10 𝑚𝑖𝑛𝑥

1 𝑚𝑖𝑛

60 𝑠𝑥 90 𝑚𝐴ℎ = 1.5 𝑚𝐴ℎ

This is extremely low, so the maximum capacity of the battery should be achievable. However, for our

calculation, to be conservative let’s assume we are at the 100 mA discharge rate – the right figure in Figure 6.

For a 4.6 V minimum, each battery should have a 1.15 V minimum. From Figure 6, we can see that the capacity

used at 1.15 V is about 1.6 AH.

Then, we find the expected life as:

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐿𝑖𝑓𝑒 (ℎ𝑟𝑠) =1600 𝑚𝐴ℎ

1.5𝑚𝐴ℎ= 1333.3 ℎ𝑟𝑠 = 44.4 𝑑𝑎𝑦𝑠

Note that this calculation is conservative in assumed battery capacity to minimum voltage – it’s a lower bound

for using AA batteries. This is a relatively long life for a prototype (about 1.5 months), and could be extended

further if C or D batteries (same voltage, higher capacity) were used, or if measurements were taken less often

(each half our or hour could be justified instead of each 10 mins). For example, an energizer D battery (also 1.5

V) discharged at 25 mA has a capacity of about 17000 mAh [41]. If measurements were taken once per hour, the

life could be increased by about 50 – a life of about 6 years could be possible.

The components required for this battery solution are shown in the circuit diagram, and include a battery holder

case ($1.40 [42]), 4 Alkaline AA batteries ($0.24-$0.50 per battery [43] -> $1.00-$2.00), and a linear regulator

($0.48 [44]). A total cost of about $3 - $4 per node is expected. The cost is three times lower than the USB

solution, thus making the argument for batteries compelling.

15

3.1.3 Software

The software component of the sensor nodes is focused on the programming set for the photon microcontroller.

The program must meet the following requirements: it must read digital data from the sensor, push this data to

the cloud through wifi, and it must enter a sleeping state to conserve battery power. It must keep a timer in the

sleep state and wake up every 10 minutes to measure and send data.

Before delving in more specifics, let us introduce some important definitions we will be using: we have

mentioned that the Particle Photon is a wifi-enabled microcontroller that can pass data to the server of the

parent company – the Particle Cloud. There are two methods for passing data – the first is using a Particle

Variable. The photon creates a variable, and can update the variable with data. The variable can be accessed

from the server. Variables are stored temporarily on the server and can be fetched to a computer or database. If

the Photon that creates the variable remains connected to the internet, the variable can be accessed at any time

on the cloud. However, if the Photon goes into sleep mode, the variable is lost on the cloud. An event is similar

to a variable, except that other photons or computers can subscribe to an event name and listen for events. In

this fashion, the photon can wake up, take a measurement, update the event by publishing data to it, and then

going back to sleep. Other devices listening for the event will hear the event and can receive its accompanying

data. Thus, our software leverages the event functionality. This will be important later when discussing the data

logging aspect of the system.

Sources used in developing the program for the Photon include the RHT03 library example [45] as well as

documentation of a similar example which used the Spark Fun Inventor’s Kit to make an Environment Monitor

[46]. The latter used the same sensor as we are using and thus helped familiarize us with use of the RHT03

library. Pseudo code for the program is as below:

wake up

take T,H measurement

while failed measurement:

brief delay

try again

publish to event: node#,T,H

sleep for 10 mins

The error handling occurs where we measure until the measurement is successful. If this is not present, failed measurements are present in the data at a higher frequency. The full code (C++) is included in the Appendices.

Note that the above design is that for the final design which involved 6 nodes. Before arriving at this, a two node

system that ran from USB power made use of particle variables to publish data instead of events, and didn’t

require sleep mode. Then, the same two node system was powered by battery and events were used. However,

we due to a development lag on the data logger side of things, we still needed to fetch data from particle

variables. Thus, we used a ‘mother’ Photon that listened to events from the nodes and saved their resulting data

to variables on the cloud. The details for these previous designs can be found in the Phase 2 report.

16

3.2 Data-logger Design The data logger must retrieve data sent from the sensor nodes to the cloud and ensure it is saved. Our final

design uses a Raspberry Pi to do this. The selection of this component and the software used for it’s functioning

will be described in this section.

Figure 7: Data - logger conceptual diagram.

3.2.1 Machine Selection A data-logger must meet the following criteria: it must be always running, it must be able to connect to the

internet and listen for the events, receive their data, and store that data or pass it on to be stored safely.

Our team experimented with a variety of different tools to serve as a data logger, including Google Sheets (with

Google Apps Scripting), the Carnegie Mellon Linux servers, a Linux computer offered by our professor, and a

Raspberry Pi (RPi). Google Sheets experienced issues with allowed computing resources – an error with

description “Service using too much computer time for one day” kept recurring. We found the RPi to be

particularly well suited for this application – it is a Linux machine from which we can run python scripts to do the

data collection. It has 8 GB of storage (depending on SD card) and can run on lower power (5W). It can be

accessed remotely via SSH for debugging or script changing. Also important – since we own the device and are

the root user with ‘sudo’ permissions - we can install programs and packages as needed. For example, we

installed pip (python package installer) which made installing the various python packages we used much easier.

On the Linux servers and Linux computer mentioned before, installing python packages became an issue due to

privileges and the complications of local installs.

The Raspberry Pi can either store or pass data. Saving the data can be as simple as writing rows to a CSV file –

this is the approach we used. Another option is to push the data to a Database (SQL). This option was explored,

but was later abandoned due to technical difficulties and a general lack of need in the prototype system. To save

the data, we had to ensure there was enough space. With 8 GB of storage space (which just depends on the size

of the SD card), it had plenty of room compared to the estimated 0.6 MB our data would require for two seeks

of collection:

Data Storage Calculation:

17

Assume there are 20 sensors (high estimate). Each send will include a variable name and a piece of data

as a string. Variable names can be as small as a number for each sensor node and a letter indicating

temperature (t) or humidity (h): e.g. r1,t1,r2,t2, etc. This will be of size 2 𝑏𝑦𝑡𝑒𝑠. The sent data will be a

string of the temperature / humidity value: five characters (tens, ones, decimal point, two decimal

places) e.g. t1=50.01 F. Each character is 1 𝑏𝑦𝑡𝑒, so a total of 5 𝑏𝑦𝑡𝑒𝑠. So, each send for each node will

be 2 𝑥 (2 + 5 𝑏𝑦𝑡𝑒𝑠) = 14 𝑏𝑦𝑡𝑒𝑠. With 20 sensors, this will be 20𝑥14 𝑏𝑦𝑡𝑒𝑠 =

280 𝑏𝑦𝑡𝑒𝑠 𝑝𝑒𝑟 𝑠𝑒𝑛𝑑. Assuming we send every 10 minutes, the data size for two weeks of monitoring

would be:

280 𝑏𝑦𝑡𝑒𝑠

𝑠𝑒𝑛𝑑𝑥 (

24 ℎ𝑟𝑠 𝑥 60 𝑚𝑖𝑛𝑢𝑡𝑒𝑠

ℎ𝑟10 𝑚𝑖𝑛𝑢𝑡𝑒𝑠

𝑠𝑒𝑛𝑑𝑠

𝑑𝑎𝑦) 𝑥14 𝑑𝑎𝑦𝑠 = 564,480 𝐵𝑦𝑡𝑒𝑠

= 564.48 𝑘𝐵 = 0.56448 𝑀𝐵

Finally, the data logged onto the Pi could be accessed at any time via remote SSH file transfer, but a more

convenient method (and a safe practice) would be to back up the data to a Google Drive or Dropbox Account.

This functionality is implemented – details are to follow.

3.2.2 Software Now we will discuss the specifics of the python script that does the listening and logging. First – let us introduce

server side events (SSE). The Particle Event that we publish data to can be listened for using SSE. In SSE, a stream

is set up through which events flow. This can be easily implemented in Python. The actual data that arrives in

the stream with an event is a Unicode string of characters that contains in it the sensor node, variable names,

and data: e.g.

event: 12745_data

data: {"data":"temp2:81.14humd2:37.10","ttl":"600","published_at":"2016-05-

02T20:52:04.846Z","coreid":"36003a000b47343432313031"}

The python script must parse out the information from this. Note that there are six nodes, each sending in

events at different times but every ten minutes. Some drift occurs in timing since error handling on the node

side can pass time. Data analysis simply interpolates values at the desired time – so if the resolution of

measurements is small (10 minutes), actual timing of data doesn’t matter as long as it comes in roughly every 10

minutes.

During design iterations, we tried a few different methods to listen for and save this data. We first tried saving

the data to variables and each time all six came in, we’d save the variables values as a row to a csv file. This fails

if one of the nodes doesn’t come in, however. We then tried setting a timer such that every ten minutes, we

save all variables to the file, and if a variable wasn’t updated, we write ‘fail’. In both of these methods, however,

if the script is saving data while an event comes in, it may miss the event. Thus, it’s better to separate each

node. Since each node will only publish each 10 minutes, if you save after it comes in you have plenty of time.

Moreover, separating the collection makes it easier to handle errors – you don’t have to worry about checking if

all variables were updated – you just save the data when it comes. However, having 6 scripts running is

inefficient and non-elegant. A better solution is to separate the listening and saving behavior. Thus – we need

multithreading.

18

In multithreading, threads or processes run, sharing the same memory space. A queue can be constructed to

pass data between the threads. Such is how our solution works. We have a listener thread and a logger thread.

The listener listens for events on the event stream, and if an event comes in, it saves the relevant data to a

queue. A logger thread then pulls from the queue and saves the data to a CSV file. The logger checks the queue

every 30 seconds.

Figure 8: Illustration of multi-threading used in the data logging script with blocks and pseudo code.

It is also important to point out that after some amount of time functioning (2-3 days) the listening thread could

freeze. This is likely due to a broken event stream, although we are not sure. The solution to this is to having

another thread that times how long it’s been since an event was logged. If a few hours pass, and nothing has

come in, it creates another instance of the listening thread. In python, you can’t just kill off the older, frozen

thread. Thus, a nightly restart of the script could do that – is not expected that this occurs more than once per

day.

The python script is found in the appendix.

The above information is specific to the latest, 6 node system. However, the first data logging design was for a

two node system that made us of Particle Variables. In that case, server side events were not necessary. Every

10 minutes, the logging script would get the variables values from a url address and save them. The details for

this design can be found in the Phase 2 report.

3.3 Design Limitations and Potential Improvements Photon Programming

Around finals time, it was discovered that two of the photons were not reporting measurements at one point.

We investigated and found that in their particular location and elevation, there was a wifi ‘dead zone’. One node

had run out of batteries after having continuously attempted to connect, while the other was at such a low

19

voltage the Wifi functionality was disabled. We replaced the batteries and placed the nodes back in their

respective locations, but the problem persisted – measurements were not coming in. After removing the nodes

from the top of the bookshelves and bringing them to standing level they connected and resumed measuring

data. Thus, we concluded that the issue was a wifi dead zone. It is likely that increased demand during finals

time stressed the wifi network, since the nodes had been successfully measuring data for 7 days before this

occurred.

The solution to this issue would be to start a timer in the code before connecting to the wifi. If time to connect

exceeds say 2 minutes, the photon could shut down. The measurements could still be taken and simply stored

on board the microcontroller. The next time the photon is able to connect to the internet, it would send the

logged data. Moreover, this suggests an approach that could save battery life. Wifi could be only activated every

n measurements – where n is to be chosen such that the desire for real time data is satisfied to some degree,

and the photon is able to store the data before transmitting it. Since connecting to wifi uses more power,

battery life would increase if, for example, n=10 and only once in ten times was the wifi activated and the

measurements sent.

Data Logging

It would be ideal to kill off frozen event stream listener threads instead of having to do a nightly restart of the

script. A shell script that could find the process ID for frozen listener threads and kill them off would work for

this.

Also, the listener / logger thread could be improved by pushing more of the data parsing to the logger thread.

Currently, the listener thread checks to make sure the event isn’t blank. Then it checks to make sure the event

has the relevant data (temp and humidity). Instead of then having the listener thread parse out the temperature

and humidity from the data string, the logger could do this (in addition to its prior function of saving the data to

a CSV file). While there haven’t been any issues related to missing data due to parsing out the data (it occurs in

fractions of a second), this would be a safe move.

Future Direction

The design outlined here works well for this prototype system. The Photon, Raspberry Pi, breadboards, and

other hardware and software components of the system are ideal for a prototype. However, a more mature

design would likely require a more concise, inexpensive, and robust hardware design and a more robust

software and data handling design. A major dependence of the prototype is on a preexisting wifi network to

connect to. A more mature design might leverage radio frequency transmission to extend range and reduce such

a dependence. Among the technologies from the survey, a Zigbee network should be explored considering the

low power requirements, long battery life, and long range. If it is still desired to use Wifi but to improve node

design and scale the system, the TI wifi MCU mentioned in that technology survey section would be ideal, since

cost deceases with quantity ordered.

4 Data Collection Results & Graphical Exploration Data collection was primarily used as a means to test the reliability of the system and get a preliminary map of

the spatial and temporal profiles of temperature and humidity. We focused on the fourth floor of the library,

where, according to user interviews, temperatures are problematic. Placement of the nodes was on top of

bookshelves, which potentially exposed them to the top of a thermal gradient resulting in higher temperatures.

Nearby air supply vents could have had an impact on readings. We elaborate on how more accurate space

20

conditions can be measured later in this section. However, for the scope and timeframe of this course, we

judged our placement of nodes to be satisfactory considering our objectives. We needed an unobtrusive

implementation that would be exposed to potential tampering.

The approximate location of sensors for the six node system is shown below.

Figure 9: Placement of sensor nodes.

First we collected data with the two node prototype system at locations 1 and 4 in the above diagram for nine

days. Below the resulting measurements are plotted:

Figure 10: Data Collection for the 2 node system – temperature.

21

Figure 11: Data collection for 2 node system - humidity.

Data was collected for 11 days for the 6 node system. Below the results are presented. Note there is a gap of

about 1 day between 4/26 and 4/27 – this was when the data logger script was redesigned to include threading

as described in the previous section. Also note that nodes 4 and 5 stop at around 4/30 since there was a wifi

dead zone in the library at their locations.

Figure 12: Six node system data collection - temperature.

22

Figure 13: Six node system data collection - humidity.

There are several observations that can be made from these data collection results. We will discuss temperature

first. For the two node system, note that the first several days show relatively steady values of temperature until

the last two days when temperature increases. A diurnal pattern emerges where temperature increases (2-3 F)

and peaks at noon and then decreases. For the six node system, this pattern continues. The average

temperature for the nodes varies substantially – up to a 5 F temperature range.

One might expect nodes that are near each other in location to have closer readings than nodes that are further

away. Moreover, one might expect nodes connected by open air as opposed to physical boundaries to have

similar measurements. Examining Figure 7 it can be seen the Fine and Rare book room sits between nodes 2 and

5 – higher temperature differences might be expected between these nodes as opposed to nodes 1 and 4, and 3

and 6 which have open air between them. Indeed, examining Figures 8-11, note that nodes 5 and 6 on the

Northeast side of the library have similar temperature results. Nodes (1 and 4) and (3 and 6) are closer in

magnitude than nodes 2 and 5. Nodes 2 and 5 show the largest difference in average temperature - 5F.

The identified diurnal pattern of indoor temperature could align with outdoor temperature changes. While the

expectation might be that indoor temperature remains steady despite outdoor conditions, we show below that

this is not the case – indeed, outdoor temperate peaks correspond to indoor temperature peaks.

23

Figure 14: Indoor and Outdoor air temperatures. Outdoor temperature data provided by the Center for Building Performance and Diagnostics at Carnegie Mellon University.

Note that at least three of the six nodes the temperature exceeded recommended indoor levels which range in

the 68-76 F range for winter and 73-79 F range for summer [47]. However, as mentioned before, the measured

temperatures could be higher than space temperatures due to elevation.

Now examining humidity, note that for the two node system the humidity is relatively close in magnitude. There

are some random increases and decreases resulting in peaks among the baseline value of about 20%. At one

point the relative humidity goes up to 40%, and for a large portion of the data the relative humidity is between

15 and 20%. For the six node system, nodes 1 and 2 show similar RH range as in the 2 node system (same

locations). The other four nodes have higher values – mostly in the 30-50% range. All six nodes increase and

decrease in the same way, maintaining an offset from each other. Recommended indoor humidity levels range

from 30-65% [47]. If the measured values are indicative of the humidity in the occupied spaces, then the spaces

are on average too dry. This may be desired in a library where books must be kept in careful condition, but the

space is heavily occupied by students studying as well. The indoor humidity may be affected by outdoor

humidity. Let’s investigate visually:

24

Figure 15: Outdoor and indoor relative humidity measurements. Outdoor humidity data provided by the Center for Building Performance and Diagnostics at Carnegie Mellon University.

Here the relationship is less clear – some peaks and valleys in outdoor humidity map to indoor humidity.

It is suggested as future work to place the nodes at locations where space temperatures can be more accurately

measured. Certain issues such as exposure to air from supply vents, different velocities could still be an issue,

but placement at approximately 4-6 ft should reflect a more accurate general temperature. It is suggested to

encase the nodes such that they are tamper proof, and place them in spaces where students are working.

5 Modeling for Prediction and Inference Having visualized the data and some of the relationships between indoor and outdoor temperature and

humidity, we now move on to a more quantitative data analysis component of the project. The objective of this

effort is twofold:

Determine a ranked order of causal factors that influence the internal temperature and humidity inside

the Hunt Library. This information would help inform any corrective action to mitigate the effect each of

these factors have on internal temperature and humidity.

We seek to build predictive algorithms that can be used to make daily predictions on internal

temperature and humidity inside Hunt Library. For individuals working at Hunt, this would help address

the unpredictability of internal conditions on a day-to-day basis, and make for a more pleasant

experience.

5.1 Training Data

The training data of the artificial neural network consists of two primary categories of data-points. Data

corresponding to the input layer, and data corresponding to the output layer. The output layer data is acquired

from the sensor network we developed, and consisted of indoor temperature and humidity data at 10-minute

intervals over a 20-day time span. For the corresponding input-layer data, we identified two subsets of parameters

25

that could influence the output layer data: Internal factors and external factors. The following parameters have

been considered potentially important input parameters to the algorithms:

External Factors

o Outdoor Temperature

o Outdoor Humidity

o Wind-Speed

o Solar Radiation

Internal Factors

o Return Air Temperature

o Hot Air Supply Temperature

o DA Temp (Fresh Air Temp)

o Mixing Damper

o HD and CD actuator position

Outdoor factors like external temperature, humidity, wind-speeds, and solar radiation were acquired from the

Centre for Building Performance and Diagnostics at Carnegie Mellon. The center collects this data at 5-minute

time intervals, and represents an ideal primary data source given that all data is collected in close proximity to

Hunt Library. The data on the internal factors, we believe, is crucial to making very accurate internal temperature

and humidity predictions given that they are a true representation of the HVAC system condition, which is likely

to be a stronger factor in accurate prediction than the external factors are. We coordinated with with Prof. Pine

Liu from Civil & Environmental Engineering, Denise McConnell from the Facilities Management Services, and

Jingkun Gao a Ph.D student at the Department of Civil & Environmental Engineering to better understand, and

seek access to the IBM Smart Campus Initiative data collected within Hunt. While we were able to get access to

the real time data, we did not have sufficient time to set up a mechanism to access all trending data over the

period of interest. Accessing this internal data and plugging it into the algorithm is, therefore, going to be part of

the future work on this project.

The objective of the data import section of the ipython notebook was to retrieve data from the Centre for Building

Performance and Diagnostics, as well as the sensor data and combine them into a single time aligned data-frame.

Once the data-frame was in place, we used approximately 90% of the input data (Data collected in the period

04/22/2016-05/01/2016) for training the algorithms, and the other 10% (Data collected in the period 05/01/2016-

05/04/2016) for testing.

5.2 Prediction Models

Our approach to this problem has been focused around preliminary identification of all parameters that might

have an influence on the overall temperature and humidity. Ideally, any automated multi-purpose algorithm used

for internal temperature and humidity prediction must be capable of accounting for, and studying the influence

of a number of causal factors and input parameters, and the degree to which they influence changes in internal

temperature and humidity in Hunt. In addition, they must at the very least show potential for accurate prediction

of internal temperature and humidity.

26

5.2.1 Artificial Neural Network

While there is little literature around internal temperature and humidity prediction within buildings, a number of

models including time-series models, Fourier series models, regression models, and Artificial Neural Network

(ANN) models have been employed for load prediction [48]. We choose to employ a forward-backward artificial

neural network algorithm for this study. The ANN algorithm, in essence, seeks to replicate the neuron transmission

network of the human brain. It has some distinct advantages: It can, over a long period of time, teach itself to

learn non-linear relationships between input parameters and outputs of the neural network, while invalidating

the need for developing any understanding of the statistical relationships between the input parameters and the

predicted variables. Further, ANNs have been shown to perform better for building load prediction applications

than other traditional learning approaches like regression [49].

5.2.1.1 Mechanics of an Artificial Neural Network and Training

ANNs, as mentioned above, seek to make predictions on final outputs basing on a linear combination of inputs

from the previous layer. The term “layer” is important to understand the network optimization approach that the

ANN uses. Initially, an ANN will take training data for parameters in the first layer (Also called the input layer), and

corresponding training data outputs for the output layer. Essentially, we seek to teach the ANN through a series

of input combinations for which the corresponding output is a certain discrete value. This approach to learning is

often necessary to help algorithms learn optimally, in order to make accurate predictions. A schematic

representation of an ANN is as follows [50]:

Figure 16: ANN graphic representation. Figure source: [50]

27

The value of the “hidden layers” in the diagram above is given by a non-linear transfer function, which computes

a weighted sum of all inputs from the previous layer, plus a bias term. On each iteration, a set of input parameter

values are “propagated forward” through an ANN. These inputs after being aggregated as a weighted

combination, undergo transformations at each node. The nature of this transformation depends on the nodal

design chosen by the algorithm designer. The sigmoidal unit is the transformation function we chose for each

node within the ANN. That is, it is the rule that governs the output value of a node given an input that is equal to

the linear combination of nodal values from the previous layer. Sigmoidal units are not zero centered, and are

prone to saturation for large input values. However, input value magnification to the extent where the sigmoid

unit would return a zero value (and kill off that particular branch of the network) are unlikely in ANNs with one

hidden layer. However, given a more convoluted neural network, other activation functions like the Tanh, RELU

[51], and Leaky RELU might be considered. At the end of the forward propagation, over the output layer of the

ANN, we compute the L2 Norm between predicted output variables and actual output variables. In an ANN, we

update weight vectors at each iteration with the objective of minimizing this l2 norm (Euclidian distance) over the

output layer.

In directional terms, the process of back-propagation is the exact opposite of forward-propagation. Subsequent

to forward propagation, we calculate the losses over the output layer. After this, we propagate the gradient

calculated over the loss backward through the network. This process utilizes the chain rule, and we end with the

expression for the gradient of the loss function with respect to each of the inputs. The 'learnprocess' function in

the iPython notebook lays out the underlying mathematics involved in the forward and backward propagation

mechanics in detail.

These values are recomputed repeatedly by iteratively varying weights (Using gradient descent optimization) to

minimize the squared error of the difference between the target function output, and the actual prediction as

given by the training data. We stop this iterative process when this error converges to a predetermined level.

There is little known science to the process of designing hidden layers in an ANN. Usually, it is a plug and play, and

the design that gives us the optimal prediction accuracy is normally chosen. As an experiment in choosing the

hidden layer design for the ANN, we computed ANN error over various hidden layer designs during testing, and

picked the hidden layer designs that resulted in least testing error. This design was then adopted, and training was

performed for that specific design (Say, 9 Nodes) to ensure the weights at each node were iterated to give the

optimum global minimum for loss over the output layer. The visualization (see following page) for the prediction

error observed over a number of hidden layer designs (In terms of No. of Nodes) for each one of temperature and

humidity helped us arrive on an optimal hidden layer design to implement the algorithm.

28

Temperature:

Figure 17: Error vs Number of Hidden Layer Nodes - Temperature.

.

Humidity:

Figure 18: Error vs Number of Hidden Layer Nodes - Humidity.

5.2.1.2 Results

Subsequent to training the ANNs (With optimized hidden layer design) individually for temperature and humidity,

we passed our test data sets through the ANN algorithms. The prediction results were as follows:

29

Temperature

Figure 19: Predicted vs Actual Temperature

Humidity

Figure 20: Predicted vs Actual Humidity

While the prediction for temperature seems slightly better than that for humidity, in general, the ANN does a poor

job of predicting the internal temperature and humidity. We think that the availability of the IBM Smart Campus

data would have made for a more expressive set of input parameters that could be, as mentioned previously,

more important predictors than the external parameters. The ANN algorithm has proven effectiveness in

multivariate predictive modeling, and with some more time invested in researching the algorithm, and building a

more expressive set of input predictor variables, we believe that the algorithm will show improved results.

30

5.2.2 Linear Regression

Having completed the ANN design and prediction, we now seek to implement a Linear Regression model primarily

for causal inference, but also as a learning experiment in predictive modeling. First, we seek to understand the

relationship between predictor variables (Inputs), and response variables. (Output) In order to do this, we first

visualize the relationship of the input variables with each one of internal temperature and humidity. A sample

visualization plotted between external temperature and internal humidity is as follows:

Figure 21: External Temperature vs Internal Humidity

The relationships between the predictor variables and Internal Temperature and Humidity (Hum_1/Temp_1),

therefore, seem to be based on a higher order multinomial relationship. (An exhaustive set of plots of these

correlations can be found on the iPython Notebook) We will conduct tests on polynomial regression models of

different orders to determine the one that gives us the best prediction.

5.2.2.1 Linear Regression Mechanics

The objective with linear regression, is to estimate Temperature and Humidity given predictor variables values.

The equation is as follows:

𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝑎0 + 𝑎1 (𝑆𝑜𝑙𝑎𝑟 𝑅𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛) + 𝑎2 (𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) + 𝑎3 (𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) + 𝑎4 (𝑊𝑖𝑛𝑑 𝑆𝑝𝑒𝑒𝑑)

𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝑎0 + 𝑎1 (𝑆𝑜𝑙𝑎𝑟 𝑅𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛) + 𝑎2 (𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) + 𝑎3 (𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) + 𝑎4 (𝑊𝑖𝑛𝑑 𝑆𝑝𝑒𝑒𝑑)

Multivariate regression offers a convenient closed form solution that would allow us to determine the coefficients

corresponding to each input variable:

31

𝐴 = [(𝑋𝑇𝑋)−1]𝑋𝑇𝑌

Where A is the vector of coefficients corresponding to each predictor variable. X is the matrix of predictor

variables, and Y is the corresponding vector of outputs observed in each training example.

We then experiment with first and second order linear regression approaches to prediction. The primary

difference between the two, is that the second order regression explodes the input variables to a higher

dimensional space. Intuitively speaking, this explosion of space allows for more dimensional degrees of freedom,

which in turn would allow the regression algorithm to better fit with the data. Again, we use the training data to

estimate parameters, after which we pass the testing data through the algorithm, which then allows us to visualize

testing accuracy.

5.2.2.2 Linear Regression Results

First Order:

Temperature

Figure 22: First Order Prediction vs Actual Internal Temperature

Humidity

32

Figure 23: First Order Prediction vs Actual Internal Humidity

Second Order

Temperature

Figure 24: Second Order Prediction vs Actual Internal Temperature

Humidity

33

Figure 25: Second Order Prediction vs Actual Internal Humidity

An immediate takeaway from these visualization is that linear regression (both orders) display a better fit than

the ANNs implemented in this project. The observed l2 loss over the actual and predicted values are calculated,

and linear regression displays lower L2 Norm. This indicates greater prediction accuracy.

What is now evident however, is the magnitude of the benefits that the second order regression offers over the

first order. To do this, we now lay out the framework for an ANOVA (Analysis of Variance) test, which, given

multiple possible predictive models could be used to judge the relative significance of a model when compared

to other models. (For example, a comparison between first and higher order regression, and if one is

significantly better than the other) In the iPython notebook, we put forth the method for carrying out an ANOVA

F Test for a first order regression model, but stop short of comparing multiple models using the F-Test given

time constraints. Future work could include a robust ANOVA F Test to compare between various predictive

models to determine one that best fits the population of the observed data. As mentioned previously, error can

be due to either one of two reasons:

The lack of expressiveness in a regression model

Predictor variables

These together comprise of what we will refer to as regression error. The other component of error, not

mentioned previously, is vague and difficult to isolate. Therefore, we attribute it to unexplained "error".

In the iPython notebook, we compute, and lay out the formulation for ANOVA parameters such as degrees of

freedom, sum of squared errors, and mean of squared errors. (Total, Error, and Regression parameters for each)

Given the mean squared errors for the explainable error (MSreg) and unexplainable error (MSerr), the ANOVA F

test seeks to calculate the F-Statistic, which is calculated by finding the ratio between MSreg and MSerr. Large

values for the F-Statistic indicate that the mean of squared explainable error (MSreg) is large. This in turn

indicates that the fitted values (Ypred) are far away from the overall mean (Mean of predicted values).

Therefore, we can conclude that the expected response would change significantly along with the magnitude of

of the regression plane being used for regression. We notice F-Statistic values of first order temperature and

34

humidity prediction: 60.48 and 112.39 respectively, which are reasonably large when compared to values in the

F-statistic table. Therefore, there is strong evidence for the implementation of higher order models.

5.3 Causal Inference

For causal inference, we computed the variance-covariance matrix which is calculated as follows:

𝐴 = [(𝑋𝑇𝑋)−1]𝑀𝑆𝑒𝑟𝑟

Where, X is the input data, and MSerr is the mean squared unexplained error, also termed as the regression

variance.

The diagonal elements of the variance-covariance matrix give us a measure of the sensitivity of the final output to

a specific input, keeping all else constant. For example, cell 2x2 in the 'variance_vect_temp_o1' data frame in the

iPython notebook gives us the sensitivity of the internal temperature to the external humidity, keeping all else

constant. The matrix, therefore, gives us a measure of causal inference, giving us an ordered list of the factors that

the internal temperature is most sensitive to. Our findings are as follows:

Temperature

Table 5: Variance-Covariance Matrix - Temperature

As is clear from the diagonal elements of the matrix above, the internal temperature's sensitivity to the

predictor variables is as follows:

Wind Speed

External Temperature

Tied between Solar Radiation / External Humidity

Humidity

35

Table 6: Variance-Covariance Matrix - Humidity

As is clear from the diagonal elements of the matrix above, the internal humidity's sensitivity to the predictor

variables is as follows. Correlation is positive unless indicated otherwise:

External Humidity

Solar Radiation (Negative Correlation)

Wind Speed

External Temperature

6 Interactive Data Analysis and Visualization Platform Visionary Scenario

The real-time indoor environmental conditions and HVAC operation status are crucial to the facility managers.

Consider a color map illustrating the temperature and humidity profile inside the Hunt Library, which always

provides a direct understanding of the environmental condition.

The visualization is also convenient for operation maintenance. When some out-of-ranges occur, the system will

notify the facility managers via message and email. He could fetch the data over two weeks, view the changes,

and quickly find where the problem is. Then he is interested in the operating variance in some winters. He could

display the concerning data in one graph, and view the difference.

In general, visualization framework is so helpful that facility managers can utilize it combined with HVAC

operation and energy consumption data to make informed decisions for optimization and maintenance.

Conceptual Design

Here we propose the design we have envisioned for the visualization application. An extensible framework is

planned to integrate the data analysis and visualization, which consists of an interactive GUI tool and underlying

interfaces and implementations, illustrated below.

36

Figure 26: Unified Modeling Language (UML) of Data Visualization.

Data plugins are responsible for extracting data from the database and making it available for processing.

Multiple data sources are available including local .csv files, html data, or trending data from SQL. Then this kind

of raw data is transferred to the framework for analysis.

Display plugins use the analyzed data (processes like data cleaning, training, and testing) within the framework

to plot various diagrams. Possible visualizations include temperature and humidity time history diagram,

temperature and humidity color mapping inside Hunt Library, and further extensible visualization. One

commonly used displaying method is time-series diagram. Users could directly discern the trending of data with

respect to time. In this platform, the plotting of variables for any time periods is supported with simple analysis

of the data imported from the dataset. A user-friendly communication of the data gathered through the

network would be a temperature and humidity map. A design is suggested below – this could be implemented

as a displaying plugin.

For each moment, the indoor temperature and humidity could be derived through the networking measuring

system or predicted via the machine learning algorithm at proposed positions inside the hunt library. Therefore,

the mapping between the variables and the coordinates could be acquired. If we make linear interpolation for

every two nodes, and connect the points with the same value of the variables, we could draw the contour line

and the grading colors illustrating the distribution of the variables within the area of Hunt Library, as shown in

Figure 3.

37

Figure 27: Color map showing temperature distribution overlaid on the floor plan.

Implementation

There are two choices for visual software developing. One is web application, which stored all the content in a

remote server. Users could get access to the information through the browser interface via internet, like

Facebook or Twitter. The web applications are usually lightweight and easily accessible, but the exposure of the

information is likely to cause some security-related problems, which requires constant time and effort for

maintenance. Another option is local implementation, which, for example, could be launched from a Python

script.

Due to the learning objective and time-limit, our team chose to create a relatively simple local application in

Python, which is commonly used for data analyzing, visualization, and building applications. There are several

Python modules which can be used to help develop a graphical user interfaces (GUI) for an application. The ones

we explore include Tkinter [52], wxPython [53], and TraitsUI [54]. For producing plots and graphics, Matplotlib

[55] and Chaco [56] are packages that become useful. Although a complete implementation of the application

was not achieved, a valuable result is a survey of the different options and their characteristics.

Tkinter is the most popular graphical user interface (GUI) for Python. It can be used to develop visual

applications with buttons, drop down menus, text boxes, and many other interactive features. It is included with

the installation of Python [52]. WxPython is similar to Tkinter, but must be installed as a new package. We

investigated it because it supports transparency for graphical objects which is desired in our application [53].

TraitsUI allows for a GUI to be produced in an easier fashion via preset objects [54]. Matplotlib graphics can be

integrated into an application for any of the above modules. Chaco can be more easily utilized in the TraitsUI

module than Matplotlib. Our recommendation for the simplest implementation would be using TraitsUI and

leverages Chaco as a graphic producer. A local application is certainly something that a facility manager or office

worker could launch on their machine and interface with. However, if more time is allowed for development, a

web-based and mobile friendly application would be most useful for the modern user.

38

7 Conclusion

This project started by exploring the points of pain users of Hunt Library experience with regard to heating and

cooling. These were identified defined through user interviews and building surveys. We determined that the

indoor conditions are not comfortable for occupants. Our solution approach starts with quantifying the spatial

and temporal profiles of indoor temperature and humidity at an appropriate resolution for building would

objectively assess the problem. Integrating this knowledge with information on outdoor conditions and

mechanical systems data would allow for prediction through modeling. Inference could also be performed to

determine what factors drive indoor conditions. This information could be integrated into a visualization

application, allowing a building manager to make data driven decisions. Possible useful outcomes include

identification of unacceptable indoor conditions and identification of poor system performance.

We then assessed technologies that could be used to take measurements of indoor temperature and humidity.

We made a final selection to use the Photon and an integrated temperature and humidity sensor based on

technical specifications, feasibility, cost, and learning objectives. Prototyping was done with a two node sensor

network that was able to record temperature and humidity for 9 days in the library. We then scaled the system

to 6 nodes and redesigned software for higher performance. We collected data for 11 days on the fourth floor of

the library. Graphical exploration of the resulting data showed that there are large ranges in temperature on the

fourth floor and that temperature are humidity follow trends of outdoor conditions. The sensor network we

designed features mobile senor nodes powered by AA batteries with a life of over a month. A data management

solution is produced that reliably collects and stores measurements.

An artificial neural network model was developed that is able to predict temperature and humidity for a given

node based on outdoor conditions. The prediction for humidity was more accurate than the prediction for

temperature. A framework was established for how mechanical systems data could be integrated to provide an

improved prediction. A regression model based on higher order statistics was implemented for inference on

causal factors as well as a second method of prediction.

A visualization application was conceptually designed. Such an application would provide graphics on indoor

conditions including plots of temperature and humidity for particular nodes as well as heat maps that could

show the spatial profile overlaid on a floorplan. It would integrate results from prediction and inference models.

A survey of methods for implementation was completed.

8 References

[1] "Wireless Sensor Tags," [Online]. Available: http://wirelesstag.net/.

39

[2] "Photon - Particle - Particle Store," [Online]. Available: https://store.particle.io/collections/photon.

[3] "Photon Datasheet - Particle Docs," [Online]. Available: https://docs.particle.io/photon/photon-

datasheet/.

[4] "SimpleLink Wi-Fi CC3200 LaunchPad - Texas Instruments," [Online]. Available:

http://www.ti.com/tool/cc3200-launchxl.

[5] "CC3200 SimpleLink, Wi-Fi Internet-of-Things Solution," [Online]. Available:

http://www.ti.com/lit/ds/symlink/cc3200.pdf.

[6] "CC3200R1M1RGCR - TI store," [Online]. Available: https://store.ti.com/CC3200R1M1RGCR.aspx.

[7] "ZigBee - Texas Instruments," [Online]. Available:

http://www.ti.com/lsds/ti/wireless_connectivity/zigbee/overview.page.

[8] "ZigBee Wireless Standard - Digi International," [Online]. Available:

http://www.digi.com/resources/standards-and-technologies/rfmodems/zigbee-wireless-standard.

[9] "IEEE 802.15.4 - Wikipedia, the free encyclopedia," [Online]. Available:

https://en.wikipedia.org/wiki/IEEE_802.15.4.

[10] "ZigBee® (IEEE 802.15.4 / ZigBee PRO | Power Calculator," [Online]. Available:

http://www.ti.com/lsds/ti/wireless_connectivity/zigbee/power_calculator.page.

[11] "CC2538 | ZigBee (IEEE 802.15.4 / ZigBee PRO) | Wireless," [Online]. Available:

http://www.ti.com/product/CC2538/samplebuy.

[12] "RF Transceiver Modules: Digi-International XB24-AWI-001," [Online]. Available:

http://www.digikey.com/product-detail/en/digi-international/XB24-AWI-001/XB24-AWI-001-ND/935965.

[13] "Building a Wireless Sensor Network in Your Home," [Online]. Available:

http://computers.tutsplus.com/tutorials/building-a-wireless-sensor-network-in-your-home--cms-19745.

[14] "Arduino - ArduinoBoardUno," [Online]. Available: https://www.arduino.cc/en/Main/ArduinoBoardUno.

[15] "Arduino UNO Rev3 – Arduino Store USA," [Online]. Available: http://store-

usa.arduino.cc/products/a000066.

[16] "Raspberry Pi 2 Model B Project Board - 1GB RAM - 900 MHz Quad-Core CPU," [Online]. Available:

http://www.amazon.com/Raspberry-Pi-Model-Project-

Board/dp/B00T2U7R7I/ref=sr_1_2?s=pc&ie=UTF8&qid=1455768551&sr=1-2&keywords=raspberry+pi.

[17] "4-Step Guide to Choosing the Right Temperature Sensor," [Online]. Available:

http://www.dataloggerinc.com/content/resources/data_logging_tutorials/295/4-

step_guide_to_choosing_the_right_temperature_sensor/.

40

[18] "Thermocouples - OMEGA Engineering," [Online]. Available:

http://www.omega.com/prodinfo/thermocouples.html.

[19] "uxcell K Type -50-700C Thermocouple Probe Temperature," [Online]. Available:

http://www.amazon.com/uxcell-50-700C-Thermocouple-Temperature-Sensor/dp/B00D8337YW.

[20] "Capgo - Semiconductor Temperature Sensors," [Online]. Available:

http://www.capgo.com/Resources/Temperature/Semiconductor/Semi.html.

[21] "Specifying RTDs - Smart Sensors Inc.," [Online]. Available: http://www.smartsensors.com/specrtds.pdf.

[22] "RTD Specifications.," [Online]. Available: http://www.omega.com/Temperature/pdf/RTDSpecs_Ref.pdf.

[23] "AGPtek Stainless Steel PT100 RTD Thermistor Sensor Probe.," [Online]. Available:

http://www.amazon.com/AGPtek-Stainless-Thermistor-Sensor-Temperature/dp/B008YP1D04.

[24] "Amazon.com: Liquid tight RTD sensor, 34 mm probe, 1/8 NPT Thread," [Online]. Available:

http://www.amazon.com/Liquid-tight-sensor-probe-Thread/dp/B00BFF843O.

[25] "caldera spa temperature sensor thermistor ewgx272 #71578," [Online]. Available:

http://www.calderaspapartsplus.com/caldera-spa-temperature-sensor-thermistor-ewgx272-71578/.

[26] "10K Thermistor Sensors - TempSensing.com," [Online]. Available:

https://www.tempsensing.com/zc/index.php?main_page=index&cPath=4_5_9.

[27] "Choosing a Humidity Sensor: A Review of Three Technologies," [Online]. Available:

http://www.sensorsmag.com/sensors/humidity-moisture/choosing-a-humidity-sensor-a-review-three-

technologies-840.

[28] "Particle Store," [Online]. Available: https://store.particle.io/.

[29] "Wireless Sensor Tags," [Online]. Available: http://wirelesstag.net/.

[30] "CC3200 SimpleLink™ Wi-Fi® and Internet-of-Things solution, a Single-Chip Wireless MCU," [Online].

Available: http://www.ti.com/product/CC3200.

[31] "ARDUINO UNO REV3," [Online]. Available: http://store-usa.arduino.cc/products/a000066.

[32] "What Is Zigbee?," [Online]. Available: http://www.zigbee.org/.

[33] "Digital relative humidity & temperature sensor RHT03," [Online]. Available:

http://cdn.sparkfun.com/datasheets/Sensors/Weather/RHT03.pdf.

[34] "ZEEFO 2 Pack Wall Charger," [Online]. Available: http://www.amazon.com/ZEEFO-Charger-Quality-

Adapter-Samsung/dp/B00V7N3AMO/ref=sr_1_3?ie=UTF8&qid=1456174037&sr=8-

3&keywords=single+usb+charger.

41

[35] "AmazonBasics USB 2.0 Extension Cable," [Online]. Available: http://www.amazon.com/AmazonBasics-

Extension-Cable--Male--Female/dp/B00NH11PEY/ref=sr_1_5?s=pc&ie=UTF8&qid=1456174081&sr=1-

5&keywords=usb+extension+cable.

[36] "AmazonBasics Micro-USB to USB Cable 2-Pack - 3-Feet," [Online]. Available:

http://www.amazon.com/AmazonBasics-Micro-USB-USB-Cable-2-

Pack/dp/B00NH13O7K/ref=sr_1_6?ie=UTF8&qid=1459129379&sr=8-6&keywords=micro+USB+cable.

[37] "SparkFun Photon Battery Shield," [Online]. Available: https://www.sparkfun.com/products/13626.

[38] "250 mA Low Quiescent Current LDO Regulator," [Online]. Available:

http://ww1.microchip.com/downloads/en/DeviceDoc/22008E.pdf.

[39] "Discharge tests of AA Batteries, Alkaline and NiMH," [Online]. Available:

http://www.powerstream.com/AA-tests.htm.

[40] "Find the energy contained in standard battery sizes," [Online]. Available:

http://www.allaboutbatteries.com/Energy-tables.html.

[41] "ENERGIZER E91 Product Datasheet," [Online]. Available: http://data.energizer.com/PDFs/E91.pdf.

[42] "uxcell 2 Pcs 4 x AA 6V Battery Holder Case," [Online]. Available:

http://www.amazon.com/dp/B00HR93NJM/ref=sr_ph?ie=UTF8&qid=1459129674&sr=1&keywords=4+AA

+battery+holder.

[43] "AmazonBasics AA Performance Alkaline Batteries (48-Pack)," [Online]. Available:

http://www.amazon.com/AmazonBasics-Performance-Alkaline-Batteries-48-

Pack/dp/B00MNV8E0C/ref=pd_sim_sbs_21_1?ie=UTF8&dpID=61G-

GoYTzqL&dpSrc=sims&preST=_AC_UL160_SR160%2C160_&refRID=04KJF5H2VQPC3D4R2ZAZ.

[44] "Microchip Technology MCP1702-4002E/TO," [Online]. Available: http://www.digikey.com/product-

detail/en/microchip-technology/MCP1702-4002E%2FTO/MCP1702-4002E%2FTO-ND.

[45] "SPARKFUNRHT03," [Online]. Available:

https://build.particle.io/libs/55f712da4121e67e380006d6/tab/RHT03-Example-Serial.ino.

[46] "SparkFun Inventor's Kit for Photon Experiment Guide," [Online]. Available:

https://learn.sparkfun.com/tutorials/sparkfun-inventors-kit-for-photon-experiment-guide/experiment-6-

environment-monitor.

[47] "AET Insights," Accredited Environmental Technologies, Inc. , 2010. [Online]. Available:

http://aetinc.biz/newsletters/2010-insights/march-2010.

[48] J. Yang, "Building energy prediction with adaptive artificial neural networks," in Datta, D. S. (2000).

Application of neural networks for the prediction of the energy consumption in a supermarket. Proceedings

of the International Conference CLIMA, (pp. 98-107)., Montréal, 2005.

42

[49] D. S. Datta, "supermarket, Application of neural networks for the prediction of the energy consumption in

a," in Proceedings of the International Conference CLIMA, 2000.

[50] J. Salatas, "Implementation of Elman Recurrent Neural Network in WEKA," 10 September 2011. [Online].

Available: http://jsalatas.ictpro.gr/implementation-of-elman-recurrent-neural-network-in-weka/.

[51] I. S. G. E. H. Alex Krizhevsky, "ImageNet Classification with Deep Convolutional Neural Networks," [Online].

Available: http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf. [Accessed 06 May 2016].

[52] "Tkinter," [Online]. Available: https://wiki.python.org/moin/TkInter.

[53] "wxPython," [Online]. Available: http://www.wxpython.org/.

[54] "TraitsUI," ENTHOUGHT, [Online]. Available: http://code.enthought.com/projects/traits_ui/.

[55] "matplotlib," [Online]. Available: http://matplotlib.org/.

[56] "Chaco," ENTHOUGHT, [Online]. Available: http://code.enthought.com/projects/chaco/.

[57] "Amazon.com: Teenitor 3.7V 600mAh 20C Lipo Battery," [Online]. Available:

http://www.amazon.com/Teenitor-600mAh-Battery-Charger-Parts/dp/B00LK0DY3O.

[58] "SparkFun Photon Battery Shield - DEV-13626 - SparkFun," [Online]. Available:

https://www.sparkfun.com/products/13626.

[59] "Solderless Plug-in BreadBoard, 830 tie-points, 2 Power lanes.," [Online]. Available:

http://www.amazon.com/Solderless-Plug-BreadBoard-tie-points-200PTS/dp/B005GYATUG.

[60] "iXCC 10ft 2pc Long USB2.0 - MicroUSB to USB Cable, A Male to Micro B Charge and Sync Cord For

Android/Samsung/Windows/MP3/Camera and other Device," [Online]. Available:

http://www.amazon.com/iXCC-10ft-2pc-Long-USB2-0/dp/B00DYWC0BI.

[61] "Amazon.com: MYT ® wall charger power supply adapter DC," [Online]. Available:

http://www.amazon.com/MYT-charger-supply-adapter-Barrel/dp/B00ZOHBE6I.

[62] "Bluetooth - Wikipedia, the free encyclopedia," [Online]. Available:

https://en.wikipedia.org/wiki/Bluetooth.

[63] "Building a Wireless Sensor Network in Your Home," [Online]. Available:

http://computers.tutsplus.com/tutorials/building-a-wireless-sensor-network-in-your-home--cms-19745.

[64] "Xively by LogMeIn," [Online]. Available: https://xively.com/.

[65] "Bluetooth® 4.0 Low Energy Single Mode Smart Sensors," [Online]. Available:

http://www.blueradios.com/hardware_sensors.htm.

[66] W. D. Jankowski, "Survey of Neural Transfer Functions," [Online]. Available:

ftp://ftp.icsi.berkeley.edu/pub/ai/jagota/vol2_6.pdf. [Accessed 4 April 2016].

43

[67] "Board Mount Temperature Sensors TEMP SENS, RTD platnum-clad NI wire (1 piece)," [Online]. Available:

http://www.amazon.com/Board-Mount-Temperature-Sensors-platnum-

clad/dp/B005T9O81O/ref=sr_1_28?s=industrial&ie=UTF8&qid=1455748360&sr=1-

28&keywords=rtd+temperature+sensor.

[68] "Humidity sensor-AM1001 AOSONG - 1 / 5 Pages," [Online]. Available:

http://pdf.directindustry.com/pdf/aosong-electronics-co-ltd/humidity-sensor-am1001-aosong/121567-

472713.html.

[69] "HDC1080 Low Power, High Accuracy Digital Humidity Sensor with Temperature Sensor," [Online].

Available: http://www.ti.com/product/HDC1080.

[70] "Product Datasheet - Energized E95," [Online]. Available: http://data.energizer.com/PDFs/e95.pdf.

[71] I. S. G. E. H. Alex Krizhevsky, "ImageNet Classification with Deep Convolutional Neural Networks," [Online].

Available: http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf. [Accessed 06 May 2016].

9 Appendix

9.1 Sensor Node Software: Photon C++ Script Below we show the code for Node 1. The rest of the 6 nodes are the same, except that any number identifier for

node 1 is changed to the corresponding node.

// Flash this code to Node 1

// Node one device ID: 32001d001347343432313031

// Include the library for RHT03 - digital sensor

#include "SparkFunRHT03/SparkFunRHT03.h"

// Define the digital pin which receives data from RHT03

const int RHT_SENSOR_PIN = D1;

// Create variables to hold temperature & humidity

float temp;

float hum;

String data_str;

// Create object from which attributes to get temperature and humidity can be used

RHT03 rht;

void setup() {

RGB.brightness(5); // set the LED brightness to low

rht.begin(RHT_SENSOR_PIN); // initiate sensor

}

void loop() {

// Update values which will be saved at each instant on cloud

int update=rht.update();

// if update fails, try again repeatly until success

while(update!=1){

delay(1100); // allow time to reset

update=rht.update();

}

// put the data into variables

temp = rht.tempF();

hum = rht.humidity();

// convert to string

data_str="temp1:"+String(temp,2)+"humd1:"+String(hum,2);

// publish data to 12745 event

Particle.publish("12745_data",data_str,600);

// delay 4.5 seconds

delay(4500);

// sleep for 10 minutes - 4.5 sec

System.sleep(SLEEP_MODE_DEEP, 595.5);

}

1

9.2 Raspberry Pi: Logger Python Script

######################################

### Data Logging Script ###

### AIS Project, 2016 ###

### Hunt Library Heating & Cooling ###

### Joseph Dryer & Alex Shu ###

######################################

###############################

### IMPORT REQUIRED MODULES ###

###############################

import Queue, threading, time, datetime, csv

from pytz import timezone

from sseclient import SSEClient #https://pypi.python.org/pypi/sseclient/0.0.8

###############################

### DEFINE FUNCTIONS ###

###############################

### FUNCTION TO RETRIEVE THE CURRENT TIME ###

def get_time():

# Return the curent time as a string in format "yyyy/mm/dd hh:mm AM/PM"

return datetime.datetime.now(timezone('US/Eastern'))\

.replace(tzinfo=None).strftime("%Y-%m-%d %I:%M %p")

### FUNCTION TO SETUP CSV FILES FOR WRITING ###

def init_csv(node):

'''

Creates a CSV file for a node's data to be stored in.

'''

# Create a CSV file with the node number and current date embedded in the file

name

date=datetime.datetime.now(timezone('US/Eastern')).replace(tzinfo=None).date()

file_name='datafile'+str(date)+"-node"+str(node)+'.csv'

# create/open the file

file=open(file_name,'wb')

write=csv.writer(file)

# string identifier for time, temperature, humidity columns with node number

time,t,h='time'+str(node),'t'+str(node),'h'+str(node)

# add the colum headers

write.writerow([time,t,h])

file.close()

# return the file name for later use

return file_name

####################################

### DEFINE CLASSES / THREADS ###

####################################

### LISTENER - LISTENS FOR SSEs, PUTS DATA IN QUEUE

class listener(threading.Thread):

def

__init__(self,group=None,target=None,name=None,args=(),kwargs=None,verbose=None):

super(listener,self).__init__()

self.target=target

2

self.name=name

print "Initializing Listener..."

def run(self):

events=SSEClient(url)

# pre-set these to false - when data comes in, the values will be set

t,h=False,False

for event in events:

# convert unicode to string

#parse to appropriate indices to find desired value

data=event.data.encode('ascii','ignore')[9:31]

if data=="": # if there is no data (ghost event)

continue # skip the rest of the iteration of loop

if data[0:4]=='temp': # check that it's our event, or the below will

make errors

temp_data=data[0:11] # this is the data for temperature e.g.

temp1:70.02

humd_data=data[11:] # this is the data for humidity e.g.

humd1:40.50

node=int(temp_data[4]) # get node number by parsing name string

t=temp_data[6:] # get value

h=humd_data[6:] # get value

now=get_time() # get time

q.put((node,now,t,h)) # put the data in the queue as a tuple

global event_time # set event_time to global variable to update it

event_time=time.time() # record the event time (for the timer

thread)

### LOGGER - PULLS FROM THE QUEUE AND SAVES DATA

class logger(threading.Thread):

def


super(logger,self).__init__()

self.target=target

self.name=name

print "Initializing Logger..."

def run(self):

while True:

if not q.empty(): # if the queue has data in it...

data=q.get() # grab a piece of data

# decode the tuple

node,time_stamp,temp,hum = data[0],data[1],data[2],data[3]

# find the file name

file_name=file_names[node-1]

file=open(file_name,'a') #open the file

write=csv.writer(file)

row=[time_stamp,temp,hum]

write.writerow(row) # write to the file

print [node]+row # print the data

file.close() # close the file

time.sleep(30) # sleep 30 seconds... avoid using all the CPU!

return

### TIMER - RESTARTS THE LISTENER IF IT HANGS ###

class timer(threading.Thread):

def


super(timer,self).__init__()

3

self.target=target

self.name=name

print "Initializing Timer..."

def run(self):

while True:

global event_time

# if 2 hours of no data, restart listener

if (time.time()-event_time)>2*60*60:

# reinstantiate listener

li=listener(name='listener')

li.start()

# restart timer

event_time=time.time()

time.sleep(60*60) # sleep for an hour (you don't want this to run

constantly)

return

###############################

### MAIN ###

###############################

if __name__ == '__main__':

### INITIALIZE FILES AND FILE NAMES

num_nodes=6 # set the number of nodes

file_names=[] # list for file names for each node

for node in range(num_nodes):

# create files for each node

# add each file name to the list

file_names.append(init_csv(node+1))

print file_names

### SET INITIAL EVENT TIME TO NOW

global event_time

event_time=time.time()

### DEFINE URLs TO CLOUD EVENT ###

url='https://api.particle.io/v1/events/12745_data?access_token=cba096a0629d2c9609ee

ff7f662588b4b32e8218'

### CREATE A QUEUE TO PASS DATA BETWEEN THREADS ###

q=Queue.Queue(0) # no size limit

### CREATES INSTANCES OF THE THREADS

li=listener(name='listener')

lo=logger(name='logger')

ti=timer(name='timer')

### INITIALIZE THE THREADS

li.start()

lo.start()

ti.start()

4

9.3 Raspberry Pi Dropbox Backup Python Script In addition to the below python script, the Dropbox-Uploader shell script must be downloaded (follow the instructions in the source mentioned in the report body) such that it can be called from the python script. Here’s the link to Github: https://github.com/andreafabrizi/Dropbox-Uploader

from subprocess import call

import time

import pandas as pd

while True:

file="/home/pi/Dropbox-Uploader/dropbox_uploader.sh upload

/home/pi/photon/logger logger"

call([file], shell=True)

time.sleep(600*6*3)

9.4 Group Account Information for Gmail, Dropbox, Particle Build The username and password for the Gmail, Dropbox account, and Particle Build account for the group are as

follows:

Username: [email protected]

Password: huntlibrary

The Particle Build account will give access to the code currently on the Photon as well allow over the air updates.

To flash code to the Photons, they must be turned off and then turned on while holding the SETUP button. The

LED will flash magenta – let go. Then the Photon will be in cloud mode and will be ready to receive over-the-air

code.

9.5 Ipython Notebook for Plotting Data The Ipython Notebook is attached as a pdf in the pages to follow.

9.6 Data Analysis Ipython Notebook The Ipython Notebook is attached as a pdf in the pages to follow.

https://github.com/andreafabrizi/Dropbox-Uploader

Ipython Notebook - Plotting Temperature and Humidity Data

Import Required Packages

In [1]: import pandas as pd

import matplotlib.pyplot as plt

import matplotlib.dates as dates

import datetime as datetime

import numpy as np

%matplotlib inline

Two Node System

In [2]: df2=pd.read_csv("datafile2016-04-07.csv", sep=’,’)

In [3]: df2[’time’]=pd.to_datetime(df2[’time’],format="%Y-%m-%dT%H:%M:%S")+pd.DateOffset(hours=-4)

In [4]: df2=df2.set_index(’time’)

In [5]: df2=df2.convert_objects(convert_numeric=True).dropna();

C:\Users\jdryer\Anaconda2\lib\site-packages\ipykernel\ main .py:1: FutureWarning: convert objects is deprecated. Use the data-type specific converters pd.to datetime, pd.to timedelta and pd.to numeric.

if name == ’ main ’:

In [6]: plot=df2[[’t1’,’t2’]].plot(style=’.’,fontsize=14,rot=45,\

ylim=(75,84),use_index=True,figsize=(15,5),grid=True)

plot.set_ylabel(’Temperature (F)’)

plot.set_xlabel(’Time Stamp’)

plot.xaxis.set_major_formatter(dates.DateFormatter(’%m/%d’))

plot.legend([’Node 1’,’Node 2’],loc=’lower right’,ncol=2)

Out[6]: <matplotlib.legend.Legend at 0x9363908>

In [7]: plot=df2[[’r1’,’h2’]].plot(style=’.’,fontsize=14,rot=45,ylim=(15,42),use_index=True,figsize=(15,5),grid=True)

plot.set_ylabel(’Relative Humidity (%)’)

plot.set_xlabel(’Time Stamp’)

plot.xaxis.set_major_formatter(dates.DateFormatter(’%m/%d’))

plot.legend([’Node 1’,’Node 2’],loc=’upper right’,ncol=2)

1

Out[7]: <matplotlib.legend.Legend at 0x65ed9b0>

In [8]: df2=df2.resample(’10T’,how=’mean’)

In [9]: df2.columns=(’t1’,’h1’,’t2’,’h2’)

In [10]: df2.head()

Out[10]: t1 h1 t2 h2

time

2016-04-06 21:00:00 78.98 21.0 76.64 21.65

2016-04-06 21:10:00 78.98 21.4 76.64 21.95

2016-04-06 21:20:00 78.98 21.5 76.64 22.00

2016-04-06 21:30:00 78.98 21.7 76.64 22.20

2016-04-06 21:40:00 78.98 21.7 76.64 22.60

Six Node System

In [11]: df6=pd.read_excel(’6 Node System Data.xlsx’,sheetname=None)

In [12]: node_1=df6[’Node 1’].set_index(’time1’)

node_2=df6[’Node 2’].set_index(’time2’)





In [13]: ax1=node_1.t1.plot(rot=45,fontsize=14,ylim=(75,83),style=’.’,figsize=(15,5))

node_2.t2.plot(ax=ax1,style=’.’)





ax1.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,\

’Node 5’,’Node 6’],loc=’lower center’,ncol=6)

ax1.xaxis.set_major_formatter(dates.DateFormatter(’%m/%d’))

ax1.set_ylabel(’Temperature (F)’,fontsize=14)

ax1.set_xlabel(’Date’,fontsize=14)

ticks=pd.Series(node_3.index).map(pd.Timestamp.date).unique()

plt.xticks(ticks[1:]);

2

In [14]: ax2=node_1.h1.plot(rot=45,fontsize=14,ylim=(17,55),style=’.’,figsize=(15,5))

node_2.h2.plot(ax=ax2,style=’.’)





ax2.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,’Node 5’,’Node 6’])

ax2.legend(loc=’lower right’,ncol=6)

ax2.set_ylabel(’Relative Humidity’,fontsize=14)

ax2.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,’Node 5’,’Node 6’],loc=’lower center’,ncol=6)

ax2.xaxis.set_major_formatter(dates.DateFormatter(’%m/%d’))

ax2.set_ylabel(’Relative Humidity (%)’,fontsize=14)

plt.xticks(pd.Series(node_3.index).map(pd.Timestamp.date).unique()[1:]);

ax2.set_xlabel(’Date’,fontsize=14)

Out[14]: <matplotlib.text.Text at 0x9f59ef0>

In [15]: node_1=node_1.resample(’10T’,how=’mean’)

node_2=node_2.resample(’10T’,how=’mean’)





3

In [16]: df6=pd.concat([node_1,node_2,node_3,node_4,node_5,node_6],axis=1)

In [17]: df_i=df6.append(df2)

Comparing to Outdoor Conditions

In [18]: df_o=pd.read_csv(’WeatherData_1.csv’)

In [19]: df_o.columns=[’time’,’rad’,’hum’,’temp’,’wind’]

In [20]: df_o[’time’]=pd.to_datetime(df_o[’time’],format="%m/%d/%Y %H:%M")

In [21]: df_o=df_o.set_index(’time’).convert_objects(convert_numeric=True).dropna()

C:\Users\jdryer\Anaconda2\lib\site-packages\ipykernel\ main .py:1: FutureWarning: convert objects is deprecated. Use the data-type specific converters pd.to datetime, pd.to timedelta and pd.to numeric.

if name == ’ main ’:

In [22]: df_o=df_o.resample(’10T’,how=’mean’)

In [23]: df_all=pd.concat([df_i,df_o],axis=1)

In [24]: ax=df_all[[’t1’,’t2’,’t3’,’t4’,’t5’,’t6’,’temp’]].plot(figsize=(15,5),style=’.’)

ax.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,’Node 5’,’Node 6’,\

’Outdoor Temperature’],loc=’upper right’,ncol=7)

ax.set_ylabel(’Temperature (F)’,fontsize=14)

ax.set_xlabel(’Date’,fontsize=14)

Out[24]: <matplotlib.text.Text at 0x9a452b0>

In [25]: ax=df_all[[’h1’,’h2’,’h3’,’h4’,’h5’,’h6’,’hum’]].plot(figsize=(15,5),style=’.’)

ax.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,’Node 5’,’Node 6’,\

’Outdoor Humidity’],loc=’lower right’,ncol=7)

ax.set_ylabel(’Relative Humidity (%)’,fontsize=14)

ax.set_xlabel(’Date’,fontsize=14)

Out[25]: <matplotlib.text.Text at 0xbd6a6d8>

4

Data Analysis: Modeling for Prediction and Inference

May 6, 2016

Node 1: Data Import, ANN, L.Reg, and Statistical Inference

Data Import and Cleaning

First, we import libraries that we will need to carry out the data analysis. Libraries will also be importedfurther into the code as the need comes up.

In [1]: %matplotlib inline

import numpy as np

import pandas as pd

from scipy import stats

from scipy import linalg

import matplotlib

import matplotlib.pyplot as plt

pd.set_option(’display.mpl_style’, ’default’)

pd.options.display.max_rows =5000

We import all the nodal data from the sensor network collected over the last three weeks using the xlrdlibrary.

In [2]: import xlrd

xl = pd.ExcelFile("6 Node System Data.xlsx")

sensordata_pd = xl.parse("Node 1")

#Imports data from the node 1 sheet.

Next, we convert strings of imported parameter values to floating points using the convert objects()method.

In [3]: sensordata_pd=sensordata_pd.convert_objects(convert_numeric=True)

sensordata_pd.head()

Out[3]: time1 t1 h1

0 2016-04-22 23:30:00 78.44 36.2

1 2016-04-22 23:40:00 79.88 36.4

2 2016-04-22 23:50:00 79.70 36.4

3 2016-04-23 00:00:00 79.88 36.3

4 2016-04-23 00:10:00 79.88 36.2

Conversion of the datetime index to a numpy.datetime datatype is essential for resampling and databaseoperations performed later on in this notebook. We convert using the to datetime method in the Pandasdocumentation.

In [4]: sensordata_pd[’time1’]=pd.to_datetime(sensordata_pd.time1.values,\

format=’%Y-%m-%d %H:%M:%S.%f’)

#Converts time1 to a timestamp from a string


1

https://pypi.python.org/pypi/xlrd

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.convert_objects.html

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html

Out[4]: time1 t1 h1

0 2016-04-22 23:30:00 78.44 36.2

1 2016-04-22 23:40:00 79.88 36.4

2 2016-04-22 23:50:00 79.70 36.4

3 2016-04-23 00:00:00 79.88 36.3

4 2016-04-23 00:10:00 79.88 36.2

In [5]: type(sensordata_pd.time1.values[0])

#Timestamp conversion in place

Out[5]: numpy.datetime64

We rename ‘t1’ and ‘h1’ to make them more readable.

In [6]: sensordata_pd.rename(columns={’time1’: ’Time_Stamp’, \

’t1’: ’Temp_1’,’h1’: ’Hum_1’}, inplace=True)

#Renames data variables


Out[6]: Time Stamp Temp 1 Hum 1

0 2016-04-22 23:30:00 78.44 36.2

1 2016-04-22 23:40:00 79.88 36.4

2 2016-04-22 23:50:00 79.70 36.4

3 2016-04-23 00:00:00 79.88 36.3

4 2016-04-23 00:10:00 79.88 36.2

As mentioned previously, it is essential for the timestamp to be set as the index of the dataframe, giventhe need to perform resampling and database joins.

In [7]: sensordata_pd=sensordata_pd.set_index(’Time_Stamp’)

#Makes timestamp the index


Out[7]: Temp 1 Hum 1

Time Stamp

2016-04-22 23:30:00 78.44 36.2

2016-04-22 23:40:00 79.88 36.4

2016-04-22 23:50:00 79.70 36.4

2016-04-23 00:00:00 79.88 36.3

2016-04-23 00:10:00 79.88 36.2

Prior to resampling, we ensure that any unsampled datapoints and their corresponding time indices areremoved from the dataframe.

In [8]: sensordata_pd= sensordata_pd.dropna(axis=0, how=’any’,\

thresh=None, subset=None, inplace=False)

#Drops nan/nat values in the input data



Time Stamp

2016-04-22 23:30:00 78.44 36.2

2016-04-22 23:40:00 79.88 36.4

2016-04-22 23:50:00 79.70 36.4

2016-04-23 00:00:00 79.88 36.3

2016-04-23 00:10:00 79.88 36.2

2

In [9]: type(sensordata_pd.index.values[0])

#Confirms that the index dtype is datetime


In [10]: sensordata_pd.shape

#Shape of the dataset subsequent to removal of nan values

Out[10]: (1480, 2)

Resampling the data is a method to ensure that the data acquisition timestamp instances align with asubset of timestamp values for data collected from the Centre for Building Performance and Diagnostics.This would also, in the future, help alignment with internal HVAC system data from Hunt Library.

In [11]: sensordata_pd=sensordata_pd.resample(’10T’,how=’mean’)

#Resamples to align timestamps with 10 minute intervals by

#taking means of intermediate variables



Time Stamp

2016-04-22 23:30:00 78.44 36.2

2016-04-22 23:40:00 79.88 36.4

2016-04-22 23:50:00 79.70 36.4

2016-04-23 00:00:00 79.88 36.3

2016-04-23 00:10:00 79.88 36.2

In [12]: sensordata_pd.shape

#Checks shape of resampled dataframe. Notice the change in shape.

Out[12]: (1671, 2)

Again, we drop any null values in the resampled dataframe.

In [13]: sensordata_pd= sensordata_pd.dropna(axis=0, how=’any’, \

thresh=None, subset=None, inplace=False)

#Removes nan/nat values from the resampled data (Precautionary)



Time Stamp

2016-04-22 23:30:00 78.44 36.2

2016-04-22 23:40:00 79.88 36.4

2016-04-22 23:50:00 79.70 36.4

2016-04-23 00:00:00 79.88 36.3

2016-04-23 00:10:00 79.88 36.2

In [14]: sensordata_pd.shape #Shape of the cleaned resampled data

Out[14]: (1446, 2)

Now, we visualize the temperature and humidity time series, as observed at Node 1.

In [15]: plot_1=sensordata_pd.plot(figsize=[20,8])

#Graphs all the variables with respect to time

3

Outlier Removal

Next, we seek to isolate and remove data points that are outliers, and not representative of the range ofvalues that you would expect to see for these variables. These outliers could be a result of faulty collectionequipment or adversity in external factors that would cause a spike in temperature or humidity. A fire,for example, would cause an outlier in the observed data. To do this, we assume that, asymptotically, thetemperature and humidity readings converge on a normal distribution. Given the characteristics of a normaldistribution, it is improbable that any accurate readings over 3 standard deviations from the mean occur.Therefore, we remove all such data points.

In [16]: sensordata_pd=sensordata_pd[(np.abs(stats.zscore(sensordata_pd)) < 3).all(axis=1)]

#Computing the Z-Statistic and removing elements with Z-Score greater than 3

sensordata_pd.shape

Out[16]: (1430, 2)

Underlying Z-Statistic Method with an example:df = pd.DataFrame(np.random.randn(100, 3), columns=list(‘ABC’))df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 3).all(axis=1)]To filter the DataFrame where only ONE column (e.g. ‘B’) is within three standard deviations:df[((df.B - df.B.mean()) / df.B.std()).abs() < 3] OR try to replace the all method with any

In [17]: sensordata_pd= sensordata_pd.dropna(axis=0,\

how=’any’, thresh=None, subset=None, inplace=False)

#Drops any Nan Values after removal of outliers (Precautionary/Sanity Check)

sensordata_pd.shape

Out[17]: (1430, 2)

Now, we go back and visualize the temperature and humidity time series, as observed at Node 1. Noticehow the outliers observed in plot 1 are no longer visible here.

In [18]: plot_2=sensordata_pd.plot(figsize=[20,8])

4

Input Data

Next, we import external data collected by the Centre for Building Performance and Diagnostics from thetime period from the start of March 2016, and through till the afternoon of May 4th, 2016. The procedureto import this data and process it is identical to the one described above. Therefore, we will not go into adetailed description.

In [19]: Campus_Data=pd.read_csv(’WeatherData_1.csv’)

#Imports Data

Campus_Data.head()

Out[19]: Unnamed: 0 Global Solar Radiation Humidity Temperature Wind Speed

0 NaN W/m2 % F mph

1 3/1/16 0:00 1.0 63.3 43.7 1.7

2 3/1/16 0:05 1.0 63.5 43.7 0.0

3 3/1/16 0:10 1.0 63.9 43.5 0.0

4 3/1/16 0:15 1.0 63.0 43.8 1.7

In [20]: Campus_Data=Campus_Data.convert_objects(convert_numeric=True)

In [21]: type(Campus_Data.Humidity.values[0])

Out[21]: numpy.float64

In [22]: Campus_Data.rename(columns={’Unnamed: 0’:\

’TimeStamp’, ’Global Solar Radiation’:\

’Solar_Radiation’,’Humidity’: ’Humidity’,\

’Temperature’: ’Temperature’,\

’Wind Speed’: ’Wind_Speed’}, inplace=True)

Campus_Data.head()

Out[22]: TimeStamp Solar Radiation Humidity Temperature Wind Speed

0 NaN NaN NaN NaN NaN

1 3/1/16 0:00 1 63.3 43.7 1.7

2 3/1/16 0:05 1 63.5 43.7 0.0

3 3/1/16 0:10 1 63.9 43.5 0.0

4 3/1/16 0:15 1 63.0 43.8 1.7

5

In [23]: Campus_Data.shape

Out[23]: (18614, 5)

In [24]: Campus_Data= Campus_Data.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False)

Campus_Data.head()

Out[24]: TimeStamp Solar Radiation Humidity Temperature Wind Speed

1 3/1/16 0:00 1 63.3 43.7 1.7

2 3/1/16 0:05 1 63.5 43.7 0.0

3 3/1/16 0:10 1 63.9 43.5 0.0

4 3/1/16 0:15 1 63.0 43.8 1.7

5 3/1/16 0:20 1 63.0 43.7 0.6

In [25]: Campus_Data.shape

Out[25]: (18593, 5)

In [26]: Campus_Data.TimeStamp.values

Out[26]: array([’3/1/16 0:00’, ’3/1/16 0:05’, ’3/1/16 0:10’, ..., ’5/4/16 14:50’,

’5/4/16 14:55’, ’5/4/16 15:00’], dtype=object)

In [27]: Campus_Data[’TimeStamp’]=pd.to_datetime(Campus_Data.TimeStamp.values\

,infer_datetime_format=True, format=’%m/%d/%Y %H:%M’)

type(Campus_Data.TimeStamp.values[1])


In [28]: Campus_Data=Campus_Data.set_index(’TimeStamp’)

#Makes timestamp the index

Campus_Data.head()

Out[28]: Solar Radiation Humidity Temperature Wind Speed

TimeStamp

2016-03-01 00:00:00 1 63.3 43.7 1.7

2016-03-01 00:05:00 1 63.5 43.7 0.0

2016-03-01 00:10:00 1 63.9 43.5 0.0

2016-03-01 00:15:00 1 63.0 43.8 1.7

2016-03-01 00:20:00 1 63.0 43.7 0.6

In [29]: Campus_Data=Campus_Data.resample(’10T’,how=’mean’)

#Resamples to align timestamps with 10 minute intervals

#by taking means of intermediate variables

Campus_Data.head()

Out[29]: Solar Radiation Humidity Temperature Wind Speed

TimeStamp

2016-03-01 00:00:00 1 63.40 43.70 0.85

2016-03-01 00:10:00 1 63.45 43.65 0.85

2016-03-01 00:20:00 1 62.95 43.70 0.85

2016-03-01 00:30:00 1 62.60 43.80 1.70

2016-03-01 00:40:00 1 63.00 43.50 0.85

In [30]: plot_3=Campus_Data.plot(figsize=[20,8])

6

Merging Data

Moving closer to implementing the ANN and linear regression, we now proceed to merge the two data sets:One, the data for internal temperature and humidity that is collected from the sensor network in Hunt.Two, the external temperature, humidity, windspeed and solar radiation data collected from the Centrefor Building Performance and Diagnostics. We do this by carrying out an outer database join, which onlycombines instances from the two dataframes that consist of data from the same instant in time.

In [31]: merged_matrix=Campus_Data.join(sensordata_pd, how=’inner’, rsuffix=’Time_Stamp’)

#Performs outer join

merged_matrix.head()

Out[31]: Solar Radiation Humidity Temperature Wind Speed \2016-04-22 23:30:00 1 89.25 60.00 2.55

2016-04-22 23:40:00 1 89.30 59.80 4.20

2016-04-22 23:50:00 1 88.85 59.65 3.90

2016-04-23 00:00:00 1 88.45 59.50 3.35

2016-04-23 00:10:00 1 87.90 59.40 3.40

Temp 1 Hum 1

2016-04-22 23:30:00 78.44 36.2

2016-04-22 23:40:00 79.88 36.4

2016-04-22 23:50:00 79.70 36.4

2016-04-23 00:00:00 79.88 36.3

2016-04-23 00:10:00 79.88 36.2

In [32]: merged_matrix.shape

Out[32]: (1430, 6)

In [33]: merged_matrix= merged_matrix.dropna(axis=0, how=’any’, thresh=None, \

subset=None, inplace=False)

#Drops Nan/Nat values (Sanity Check)

merged_matrix.head()

7

http://pandas.pydata.org/pandas-docs/stable/merging.html


2016-04-22 23:40:00 1 89.30 59.80 4.20

2016-04-22 23:50:00 1 88.85 59.65 3.90

2016-04-23 00:00:00 1 88.45 59.50 3.35

2016-04-23 00:10:00 1 87.90 59.40 3.40

Temp 1 Hum 1

2016-04-22 23:30:00 78.44 36.2

2016-04-22 23:40:00 79.88 36.4

2016-04-22 23:50:00 79.70 36.4

2016-04-23 00:00:00 79.88 36.3

2016-04-23 00:10:00 79.88 36.2

In [34]: print merged_matrix.shape

(1430, 6)

Next, in order to understand the time series behavior of each one of the predictor/input variables, weplot observed values of these variables with respect to time.

In [35]: plot_4=merged_matrix.plot(figsize=[20,8])

Artificial Neural Network

Now, we import libraries required to implement the Artificial Neural Network.

In [203]: import sys

import numpy.random

import random

In [204]: traintestData_raw = merged_matrix[[’Solar_Radiation’,’Humidity’,\

’Temperature’,’Wind_Speed’,’Temp_1’,’Hum_1’]]

traintestData_raw.head()


2016-04-22 23:40:00 1 89.30 59.80 4.20

8

2016-04-22 23:50:00 1 88.85 59.65 3.90

2016-04-23 00:00:00 1 88.45 59.50 3.35

2016-04-23 00:10:00 1 87.90 59.40 3.40

Temp 1 Hum 1

2016-04-22 23:30:00 78.44 36.2

2016-04-22 23:40:00 79.88 36.4

2016-04-22 23:50:00 79.70 36.4

2016-04-23 00:00:00 79.88 36.3

2016-04-23 00:10:00 79.88 36.2

In [205]: len(traintestData_raw.index)

Out[205]: 1430

We initialize training data to be roughly 90% of the samples. Testing is carried out on the remaining10%.

In [206]: trainData=traintestData_raw.loc[’20160422’:’20160501’]

len(trainData.index)

Out[206]: 1080

The sigmoidal unit is the transformation function of each node within the ANN. That is, it is the rulethat governs the output value of a node given a input that is equal to the linear combination of nodal valuesfrom the previous layer. Sigmoidal units are not zero centred, and are prone to saturation for large inputvalues. However, input value magnification to the extent where the sigmoid unit would return a zero valueand kill off that particular branch of the network are unlikely in ANNs with one hidden layer. However,given a more convoluted neural network, other activation functions like the Tanh, RELU, and Leaky RELUmight be considered. We first define the sigmoid function for the forward-propogation section of the ANN,and the derivative of the sigmoid function for use in the gradient back-propogation section of the ANN.

In [207]: # sigmoid unit

def sigmoid(x):

return 1.0 / (1 + np.exp(-x))

# derivative of sigmoid unit

def derivSigmoid(x):

return x * (1 - x)

Over the output layer, we compute the L2 Norm between predicted output variables and actual outputvariables. In an ANN, we update weight vectors at each iteration with the objective of minimising this l2norm over the output layer. Below, we describe a function to calcualate the output error at the end of eachiteration of the ANN.

In [208]: def getSquareError(outputError):

result = 0

(rows, cols) = outputError.shape

for i in range(rows):

result += outputError[i, cols-1]**2

return result

Below, we define the procedure for forward propogation of inputs through the ANN, which then undergo aseries of nodal transformations and result in an output. In directional terms, the process of back-propogationis the exact opposite. Subsequent to forward propogation, we calculate the losses over the output layer, afterwhich we propogate the gradient calculated over the loss backward through the network. This process utilizesthe chain rule, and we end with the expression for the gradient of the loss function with respect to each ofthe inputs. Below, we lay out the logic for these two essential components of the ANN algorithm within thefunction ‘learnprocess’.

9

In [209]: def learnProcess(trainExamples, resultExamples,i):

exampleNum = len(trainExamples)

attrNum = len(trainExamples[0])

resultNum = len(resultExamples[0])

# weight array from input to hidden layer

w0To1 = (np.random.random((attrNum, NODENUM_HL))* RANDOM_RAGE - 0.5 * RANDOM_RAGE)

# weight array from hidden layer to output

w1To2 = (np.random.random((NODENUM_HL, resultNum))* RANDOM_RAGE - 0.5 * RANDOM_RAGE)

squareError = getSquareError(resultExamples)

layer0 = trainExamples

layer1 = sigmoid(np.dot(trainExamples, w0To1))

layer2 = sigmoid(np.dot(layer1, w1To2))

oError = resultExamples - layer2

curError = getSquareError(oError)

# print(curError)

n = 1

print("Inital square error:", squareError)

# back propagation process

while (curError < squareError and n < ITERATION_NUM):

squareError = curError

# case1 training rule for output unit weight

deltaL2 = LEARNING_RATE * oError * derivSigmoid(layer2)

#dL/dx Calculates the derivative of the loss function over

w1To2 += np.dot(layer1.T, deltaL2)

hError = np.dot(deltaL2, w1To2.T)

# case2 training rule for hidden unit weight

deltaL1 = hError * derivSigmoid(layer1)

w0To1 += LEARNING_RATE * np.dot(layer0.T, deltaL1)

layer0 = trainExamples

layer1 = sigmoid(np.dot(trainExamples, w0To1))


oError = resultExamples - layer2


n += 1

print("Total iteration:", n)

print("Final square error:", curError)

node_err[i-1][0]=i

node_err[i-1][1]=curError

return (w0To1, w1To2)

We carry out ANN training individually for temperature and humidity prediction. The following sectionconsists of the training process for temperature prediction. We use a learning rate of 0.001 (Also called the

10

step size) for altering weights at each iteration with the objective of minimising the squared loss (l2 Norm)over the output layer. This iteration is carried out through a process called gradient descent. Further, wealso loop through hidden layer sizes (in terms of the number of nodes) to determine the hidden layer sizethat minimises training error.

Temperature Training

In [210]: # learning rate

LEARNING_RATE = 0.001 #Gradient descent step size

RANDOM_RAGE = 0.1 #Defines a range limitation for random initialization

# iteration number

ITERATION_NUM = 150000

node_err=(10,2)

node_err=np.ones(node_err)

for i in range(1,11):

# the number of nodes in hidden layer

NODENUM_HL = i

trainData = trainData.dropna(axis = 0, how = ’any’,\

thresh = None, subset = None, inplace = False)

train_Data = trainData.as_matrix(columns = [’Temperature’, ’Humidity’,\

’Wind_Speed’, ’Solar_Radiation’])

result_Data = trainData.as_matrix(columns = [’Temp_1’])

maxValue = max(result_Data)

minValue = min(result_Data)

result_Data = (result_Data - minValue) / (maxValue - minValue)

(w0To1, w1To2) = learnProcess(train_Data, result_Data,i)

node_err_temp=node_err

print node_err_temp

(’Inital square error:’, 274.2273341836725)

(’Total iteration:’, 150000)

(’Final square error:’, 18.325206650737716)























11






[[ 1. 18.32520665]

[ 2. 18.10471092]

[ 3. 9.47913158]

[ 4. 9.42465337]

[ 5. 9.22090676]

[ 6. 9.4498776 ]

[ 7. 9.9136854 ]

[ 8. 9.35747246]

[ 9. 9.5400721 ]

[ 10. 9.19612004]]

In [211]: node_err_temp_pd = pd.DataFrame(node_err_temp\

,columns={’No_of_Nodes’,’Error’})

node_err_temp_pd

Out[211]: No of Nodes Error

0 1 18.325207

1 2 18.104711

2 3 9.479132

3 4 9.424653

4 5 9.220907

5 6 9.449878

6 7 9.913685

7 8 9.357472

8 9 9.540072

9 10 9.196120

Now, we visualize the error with respect to hidden layer size to determine the size that results in theleast error.

In [420]: plot_5=node_err_temp_pd.plot(x=’No_of_Nodes’,y=’Error’,figsize=[20,8])

plot_5.set_ylabel(’Error’, color=’b’)

Out[420]: <matplotlib.text.Text at 0x112d3fd10>

12

Therefore, we find that a 5 node hidden layer results in the minimum error for training.

Setting up Testing Data - Temperature

Recalibration of Weights After Considering Optimal Nodal Design We now recalibrate weightsfor the optimal hidden layer nodal design.




# iteration number


node_err=(10,2)



NODENUM_HL = 5





result_Data = trainData.as_matrix(columns = [’Temp_1’])




(w0To1, w1To2) = learnProcess(train_Data, result_Data,NODENUM_HL)

print node_err




[[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 5. 10.15742097]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]]

Accessing roughly 10% of the data, which spans from 05/02/2016 to 05/04/2016, and assigning it to thetest data variable. Testing will be carried out on this variable.

In [261]: testData=traintestData_raw.loc[’20160502’:’20160504’]

testData.head()

Out[261]: Solar Radiation Humidity Temperature Wind Speed \2016-05-02 00:00:00 1 91.95 60.25 0

2016-05-02 00:20:00 1 92.60 60.15 0

2016-05-02 00:30:00 1 92.95 60.05 0

13

2016-05-02 00:40:00 1 93.25 59.90 0

2016-05-02 00:50:00 1 93.70 59.90 0

Temp 1 Hum 1

2016-05-02 00:00:00 79.70 37.0

2016-05-02 00:20:00 79.52 36.9

2016-05-02 00:30:00 81.86 36.7

2016-05-02 00:40:00 79.70 37.0

2016-05-02 00:50:00 79.52 36.1

We now propogate this test data forward through the ANN for the optimised weights, and find loss overthe normalized parameters.

In [262]: resultExamples_temptest=testData.as_matrix(columns=[’Temp_1’])

maxValue = max(resultExamples_temptest)

minValue = min(resultExamples_temptest)

resultExamples_temptest = (resultExamples_temptest - minValue) / (maxValue - minValue)

testData = testData.as_matrix(columns = \

[’Temperature’, ’Humidity’,\


layer0 = testData

layer1 = sigmoid(np.dot(testData, w0To1))


oError = resultExamples_temptest - layer2




We now map out the final paramter outputs predicted by the ANN, against the actual values as presentin the data.

In [263]: temp_ANN_pred=pd.DataFrame({’Predicted_Temp’:layer2[:,0],’Actual_Temp’:resultExamples_temptest[:,0]})

temp_ANN_pred.head()

Out[263]: Actual Temp Predicted Temp

0 0.40 0.404686

1 0.35 0.402527

2 1.00 0.401133

3 0.40 0.399618

4 0.35 0.398476


trainData=traintestData_raw.loc[’20160422’:’20160501’]

#Reinitializes test and train data to a dataframe

In [265]: temp_ANN_pred.index=testData.index

#Assigns timestamp index

We now visualize the actual versus predicted normalized values as output by the ANN. Surprisinly, thefit is a poor approximation of the actual paramter values. This could possibly be due to overfitting duringtesting, given that testing accuracy over a 5 hidden layer design was far superior to the accuracy observedhere.

14

In [424]: plot_ANNTemp=temp_ANN_pred.plot(figsize=[20,8])

plot_ANNTemp.set_ylabel(’Internal Temperature (Normalized)’, color=’b’)

plot_ANNTemp.set_xlabel(’Time Series’, color=’b’)

Out[424]: <matplotlib.text.Text at 0x118bb31d0>

Training for the humidity, we follow the same process as that used for temperature training.

Humidity Training



RANDOM_RAGE = 0.1


NODENUM_HL = 5

# iteration number


node_err=(10,2)


for i in range(1,11):


NODENUM_HL = i


thresh = None, \

subset = None, inplace = False)

train_Data = trainData.as_matrix(columns = \



result_Data = trainData.as_matrix(columns = [’Hum_1’])




(w0To1, w1To2) = learnProcess(train_Data, result_Data,i)

node_err_hum=node_err

print node_err_hum

15































[[ 1. 27.16703893]

[ 2. 25.48741371]

[ 3. 26.06741124]

[ 4. 21.48060924]

[ 5. 22.16390023]

[ 6. 21.47789644]

[ 7. 23.38663287]

[ 8. 23.64840467]

[ 9. 18.29420647]

[ 10. 18.2419761 ]]

In [236]: node_err_hum_pd = pd.DataFrame(node_err_hum,columns={’No. of Nodes’,’Error’})

node_err_hum_pd

Out[236]: No. of Nodes Error

0 1 27.167039

1 2 25.487414

2 3 26.067411

3 4 21.480609

4 5 22.163900

5 6 21.477896

6 7 23.386633

7 8 23.648405

8 9 18.294206

9 10 18.241976

16

In [418]: plot_6=node_err_hum_pd.plot(x=’No. of Nodes’,y=’Error’,figsize=[20,8])

plot_6.set_ylabel(’Error’, color=’b’)

Out[418]: <matplotlib.text.Text at 0x118b8b510>

Setting up Testing Data - Humidity

Recalibration of Weights After Considering Optimal Nodal Design We now recalibrate weightsfor the optimal hidden layer nodal design.




# iteration number


node_err=(10,2)



NODENUM_HL = 10





result_Data = trainData.as_matrix(columns = [’Hum_1’])




(w0To1, w1To2) = learnProcess(train_Data, result_Data,NODENUM_HL)

print node_err




17

[[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 1. 1. ]

[ 10. 17.78651234]]

We now propogate test data forward through the ANN for the optimised weights, and find loss over thenormalized parameters.

In [281]: resultExamples_temphum=testData.as_matrix(columns=[’Hum_1’])

maxValue = max(resultExamples_temphum)

minValue = min(resultExamples_temphum)

resultExamples_temphum = (resultExamples_temphum - minValue) / (maxValue - minValue)

testData = testData.as_matrix(columns = \



layer0 = testData

layer1 = sigmoid(np.dot(testData, w0To1))


oError = resultExamples_temphum - layer2




We now map out the final paramter outputs predicted by the ANN, against the actual values as presentin the data.

In [282]: hum_ANN_pred=pd.DataFrame({’Predicted_Hum’:layer2[:,0],’Actual_Hum’:resultExamples_temphum[:,0]})

hum_ANN_pred.head()

Out[282]: Actual Hum Predicted Hum

0 0.878505 0.573453

1 0.869159 0.575152

2 0.850467 0.575898

3 0.878505 0.576313

4 0.794393 0.577679



#Reinitializes test and train data to a dataframe

In [284]: hum_ANN_pred.index=testData.index

#Assigns timestamp index

Again, the ANN outputs a very innacurate prediction trend. This is again a possible side effect ofoverfitting, given that ten nodes were considered, and cross-validation was not carried out.

18

In [425]: plot_ANNhum=hum_ANN_pred.plot(figsize=[20,8])

plot_ANNhum.set_ylabel(’Internal Humidity (Normalized)’, color=’b’)

plot_ANNhum.set_xlabel(’Time Series’, color=’b’)

Out[425]: <matplotlib.text.Text at 0x119720c10>

Linear Regression (Causal Inference)

Having completed the ANN design and prediction, we now seek to implement a Linear Regression modelprimarily for causal inference, but also as a learning expriment in predictive modeling. First, we seek tounderstand the relationship between predictor variables (Inputs), and response variables. (Output) In orderto do this, we first visualize the relationship of the input variables with each one of internal temperature andhumidity.

In [47]: plot_7=trainData.plot(kind=’scatter’,x=’Temp_1’,y=’Solar_Radiation’,figsize=[20,8])

In [48]: plot_8=trainData.plot(kind=’scatter’,x=’Temp_1’,y=’Temperature’,figsize=[20,8])

19

In [49]: plot_9=trainData.plot(kind=’scatter’,x=’Temp_1’,y=’Humidity’,figsize=[20,8])

In [50]: plot_10=trainData.plot(kind=’scatter’,x=’Temp_1’,y=’Wind_Speed’,figsize=[20,8])

20

In [51]: plot_11=trainData.plot(kind=’scatter’,x=’Hum_1’,y=’Solar_Radiation’,figsize=[20,8])

In [52]: plot_12=trainData.plot(kind=’scatter’,x=’Hum_1’,y=’Temperature’,figsize=[20,8])

In [53]: plot_13=trainData.plot(kind=’scatter’,x=’Hum_1’,y=’Humidity’,figsize=[20,8])

21

In [54]: plot_14=trainData.plot(kind=’scatter’,x=’Hum_1’,y=’Wind_Speed’,figsize=[20,8])

The relationships between the predictor variables and Hum 1/Temp 1, therefore, seem to be based on ahigher order multinomial relationship. We will conduct tests on polynomial regression models of differentorders to determine the one that gives us the best fit.

Linear Regression

The objective here, is to estimate Temperature and Humidity given predictor variables values. The equationis as follows:

Internal Temperature=a0+a1(Solar Radiation)+a2(Temperature)+a3(Humidity)+a4(Wind Speed)Internal Humidity=a0+a1(Solar Radiation)+a2(Temperature)+a3(Humidity)+a4(Wind Speed)



In [305]: trainData_pred=trainData[[’Solar_Radiation’,’Temperature’,’Humidity’,’Wind_Speed’]]

trainData_pred.head()

22

Out[305]: Solar Radiation Temperature Humidity Wind Speed

2016-04-22 23:30:00 1 60.00 89.25 2.55

2016-04-22 23:40:00 1 59.80 89.30 4.20

2016-04-22 23:50:00 1 59.65 88.85 3.90

2016-04-23 00:00:00 1 59.50 88.45 3.35

2016-04-23 00:10:00 1 59.40 87.90 3.40

In [306]: trainData_pred.insert(0, ’Ones’, 1, allow_duplicates=True)

#Adding a constant term that would correspond to a0

trainData_pred.head()

Out[306]: Ones Solar Radiation Temperature Humidity Wind Speed

2016-04-22 23:30:00 1 1 60.00 89.25 2.55

2016-04-22 23:40:00 1 1 59.80 89.30 4.20

2016-04-22 23:50:00 1 1 59.65 88.85 3.90

2016-04-23 00:00:00 1 1 59.50 88.45 3.35

2016-04-23 00:10:00 1 1 59.40 87.90 3.40

We now convert the training data into a numpy matrix to perform subsequent matrix operations.

In [307]: trainData_pred=trainData_pred.as_matrix(columns=None)

#Converting to a Numpy Matrix

trainData_pred

Out[307]: array([[ 1. , 1. , 60. , 89.25, 2.55],

[ 1. , 1. , 59.8 , 89.3 , 4.2 ],

[ 1. , 1. , 59.65, 88.85, 3.9 ],

...,

[ 1. , 1. , 60.15, 92.4 , 0. ],

[ 1. , 1. , 60.55, 91.65, 0.55],

[ 1. , 1. , 60.45, 91.45, 0. ]])

Temperature Tests We now do the same with training data output for the temperature, and convertthis vector of values into a numpy array.

In [308]: trainData_out_temp=trainData[[’Temp_1’]]

#Accesses ’Temp_1’, which gives us the internal temperature in Hunt Library

trainData_out_temp.head()

Out[308]: Temp 1

2016-04-22 23:30:00 78.44

2016-04-22 23:40:00 79.88

2016-04-22 23:50:00 79.70

2016-04-23 00:00:00 79.88

2016-04-23 00:10:00 79.88

In [309]: trainData_out_temp=trainData_out_temp.as_matrix(columns=None)

#Converts to Matrix

trainData_out_temp[:10]

Out[309]: array([[ 78.44],

[ 79.88],

[ 79.7 ],

[ 79.88],

[ 79.88],

[ 79.7 ],

23

[ 79.7 ],

[ 79.7 ],

[ 79.88],

[ 79.7 ]])

First Order Multivariate Regression We now seek to perform first order multivariate linear regres-sion. Multivariate regression offers a convenient closed form solution (Stated below) that would allow us todetermine the coefficients corresponding to each input variable.

Solution: A=[(X’X)ˆ-1]X’YWhere A is the vector of coefficients corresponding to each predictor variable. X is the matrix of predictor

variables, and Y is the corresponding vector of outputs observed in each training example.

LstSq Function We use the np.linalg.lstsq(input,output) function to determine the value of the coef-ficients. This function takes the predictor and response variables as inputs, and outputs an array of valuesconsisting of the preditor variable coefficients (A), l2 loss (Which is equal to the sum of the squared differencebetween predicted Y, and actual Y), Rank of the Matrix, and Singular Values of the predictor variables.

In [310]: a_funct_temp_o1 = np.linalg.lstsq(trainData_pred, trainData_out_temp)

In [311]: a_funct_temp_o1

# Returns coeff a values in index 0, l2 Norm Loss in index 1,

#rank of matrix in index 2, and 3 is singular values of trainData_pred

Out[311]: (array([[ 7.78318038e+01],

[ 5.28436528e-04],

[ 3.74469693e-02],

[ -8.06533948e-03],

[ 6.81049727e-02]]),

array([ 237.71098444]),

5,

array([ 1.02197701e+04, 2.65591549e+03, 4.77501272e+02,

4.75429041e+01, 2.99992583e+00]))

Next, we visualize the predicted values by calculating the dot product between the coefficients andpredictor variables.


In [313]: testData_pred=testData[[’Solar_Radiation’,’Temperature’,’Humidity’,’Wind_Speed’]]

testData_pred.insert(0, ’Ones’, 1, allow_duplicates=True)

testData_pred.head()

Out[313]: Ones Solar Radiation Temperature Humidity Wind Speed

2016-05-02 00:00:00 1 1 60.25 91.95 0

2016-05-02 00:20:00 1 1 60.15 92.60 0

2016-05-02 00:30:00 1 1 60.05 92.95 0

2016-05-02 00:40:00 1 1 59.90 93.25 0

2016-05-02 00:50:00 1 1 59.90 93.70 0

In [314]: temp_predvect_o1=np.dot(testData_pred,a_funct_temp_o1[0])

#Accessing coefficient vector from index, and calculating dot product with trainingdata

temp_predvect_o1

Out[314]: array([[ 79.34690418],

[ 79.33791701],

[ 79.33134944],

24

http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linalg.lstsq.html

[ 79.3233128 ],

[ 79.31968339],

[ 79.31539144],

[ 79.31150276],

[ 79.31029296],

[ 79.30507918],

[ 79.29718652],

[ 79.28914987],

[ 79.32425182],

[ 79.29886583],

[ 79.29203899],

[ 79.28253326],

[ 79.27703151],

[ 79.29694687],

[ 79.25171177],

[ 79.22445343],

[ 79.22056474],

[ 79.21975821],

[ 79.21105902],

[ 79.21319064],

[ 79.20863941],

[ 79.21077104],

[ 79.61187879],

[ 79.26911186],

[ 79.30771958],

[ 79.18819643],

[ 79.22736217],

[ 79.20318598],

[ 79.20131363],

[ 79.24404897],

[ 79.16572825],

[ 79.14529676],

[ 79.1601112 ],

[ 79.18281831],

[ 79.13993499],

[ 79.14898687],

[ 79.15261534],

[ 79.16797457],

[ 79.25506473],

[ 79.27305682],

[ 79.3107812 ],

[ 79.39463036],

[ 79.37939298],

[ 79.3910607 ],

[ 79.39539829],

[ 79.46881356],

[ 79.50703061],

[ 79.51937309],

[ 79.70004373],

[ 79.76789669],

[ 79.69449618],

[ 80.11824887],

[ 79.8903142 ],

[ 80.08596081],

25

[ 80.15221872],

[ 80.10932833],

[ 80.05593112],

[ 79.93274299],

[ 80.14178259],

[ 79.96191569],

[ 80.20196463],

[ 80.06097989],

[ 80.48410712],

[ 80.4318415 ],

[ 80.21346672],

[ 80.33765434],

[ 80.4883458 ],

[ 80.34693215],

[ 80.20871979],

[ 80.12441679],

[ 80.27693977],

[ 80.70493373],

[ 80.54203435],

[ 80.22188307],

[ 80.28132491],

[ 80.29794114],

[ 80.17650734],

[ 80.25415 ],

[ 80.09821907],

[ 80.21927044],

[ 80.36092161],

[ 80.50257278],

[ 80.66340819],

[ 80.44121225],

[ 80.25630045],

[ 80.18556066],

[ 80.21896857],

[ 80.25155886],

[ 80.11445035],

[ 80.06440789],

[ 80.12072441],

[ 80.08877693],

[ 80.09046756],

[ 80.06551353],

[ 80.21321918],

[ 80.34039133],

[ 80.28906294],

[ 80.16840498],

[ 80.12394717],

[ 80.02218974],

[ 80.0482404 ],

[ 79.9931941 ],

[ 79.85041514],

[ 79.90190421],

[ 79.80089797],

[ 79.77046041],

[ 79.78984116],

[ 79.71039852],

26

[ 79.71630537],

[ 79.66299079],

[ 79.5713173 ],

[ 79.5760427 ],

[ 79.577866 ],

[ 79.4624932 ],

[ 79.40652125],

[ 79.38820589],

[ 79.38064433],

[ 79.35456467],

[ 79.37634995],

[ 79.42723322],

[ 79.37718761],

[ 79.40807778],

[ 79.34086193],

[ 79.40065463],

[ 79.42653883],

[ 79.39766742],

[ 79.41288513],

[ 79.37626505],

[ 79.35033181],

[ 79.34042281],

[ 79.31088886],

[ 79.27026881],

[ 79.35812638],

[ 79.38996707],

[ 79.32084375],

[ 79.25344879],

[ 79.21852595],

[ 79.25558042],

[ 79.21210237],

[ 79.20876094],

[ 79.18337496],

[ 79.22196232],

[ 79.24504 ],

[ 79.212747 ],

[ 79.18820267],

[ 79.2004184 ],

[ 79.19399482],

[ 79.22462571],

[ 79.19468849],

[ 79.19276709],

[ 79.16136079],

[ 79.18937661],

[ 79.21934496],

[ 79.15408163],

[ 79.15267637],

[ 79.15337003],

[ 79.05747582],

[ 79.17259173],

[ 79.07298151],

[ 79.13960105],

[ 79.0986579 ],

[ 79.05811801],

27

[ 79.03126294],

[ 78.99930939],

[ 79.0370264 ],

[ 79.02642374],

[ 79.09799778],

[ 79.05210013],

[ 79.03005557],

[ 79.00700901],

[ 79.01833804],

[ 79.04020592],

[ 79.03136274],

[ 78.96843884],

[ 78.96495343],

[ 79.00347698],

[ 79.00534933],

[ 79.00868181],

[ 79.08307315],

[ 78.99484126],

[ 78.97027568],

[ 78.95181395],

[ 78.99070283],

[ 78.98758539],

[ 79.0016334 ],

[ 79.03723297],

[ 79.13325895],

[ 79.08013459],

[ 79.05781632],

[ 79.06962355],

[ 79.13400259],

[ 79.18527959],

[ 79.20910531],

[ 79.26562346],

[ 79.28680268],

[ 79.26742675],

[ 79.23023604],

[ 79.25417177],

[ 79.35919177],

[ 79.24682934],

[ 79.30971346],

[ 79.32407746],

[ 79.34994586],

[ 79.40644042],

[ 79.46464989],

[ 79.58652783],

[ 79.58139454],

[ 79.81695539],

[ 79.83357168],

[ 79.82909385],

[ 79.82456317],

[ 79.82008534],

[ 79.81601078],

[ 79.81150652],

[ 79.80887462],

[ 79.81007765],

28

[ 79.80900507],

[ 79.80447439],

[ 79.79999656],

[ 79.795922 ],

[ 79.79144417],

[ 79.78691349],

[ 79.78283893],

[ 79.7783611 ],

[ 79.77385684],

[ 79.76935259],

[ 79.76527803],

[ 79.75667279],

[ 79.7556002 ],

[ 79.75452762],

[ 79.75002337],

[ 79.74592238],

[ 79.74144455],

[ 79.73696672],

[ 79.73658401],

[ 79.73210618],

[ 79.72762835],

[ 79.72312409],

[ 79.71902311],

[ 79.71454528],

[ 79.71006744],

[ 79.70594004],

[ 79.7014622 ],

[ 79.70379487],

[ 79.69931704],

[ 79.69518963],

[ 79.6907118 ],

[ 79.68623397],

[ 79.68213298],

[ 79.67762873],

[ 79.6731509 ],

[ 79.66867307],

[ 79.66006783],

[ 79.65933469],

[ 79.65523371],

[ 79.65072945],

[ 79.64965687],

[ 79.64858429],

[ 79.64445688],

[ 79.63997905],

[ 79.63590448],

[ 79.63140023],

[ 79.62689598],

[ 79.62241815],

[ 79.61834358],

[ 79.61381291],

[ 79.60933508],

[ 79.60526051],

[ 79.60078268],

[ 79.59625201],

29

[ 79.59517942],

[ 79.59451011],

[ 79.59000585],

[ 79.58737395],

[ 79.58517173],

[ 79.5806939 ],

[ 79.57616323],

[ 79.5716854 ],

[ 79.58699319],

[ 79.50674865],

[ 79.49937455],

[ 79.47448676],

[ 79.44722843],

[ 79.47880983],

[ 79.44853317],

[ 79.44130305],

[ 79.420353 ],

[ 79.41606105],

[ 79.42831191],

[ 79.46764442],

[ 79.44653889],

[ 79.33122995],

[ 79.35675992],

[ 79.35874757],

[ 79.33857536],

[ 79.36725373],

[ 79.28494032],

[ 79.21546258],

[ 79.18250946],

[ 79.15445345],

[ 79.10102032],

[ 79.10822267],

[ 79.10698511],

[ 79.11782581],

[ 79.12048281],

[ 79.11271224],

[ 79.1161699 ],

[ 79.12903681],

[ 79.1436166 ],

[ 79.16086459],

[ 79.19275149],

[ 79.21052325],

[ 79.24954462],

[ 79.27028716],

[ 79.30779705],

[ 79.31710105],

[ 79.40340592],

[ 79.40650629],

[ 79.3654009 ],

[ 79.41720828],

[ 79.48020233],

[ 79.60245016],

[ 79.61562306],

[ 79.60301428],

30

[ 79.75558508],

[ 79.96536001],

[ 79.94097236],

[ 79.88846493],

[ 79.82935887],

[ 79.73985661],

[ 79.71808684],

[ 79.7758631 ],

[ 79.80494026],

[ 79.90564227],

[ 79.93839243],

[ 80.1464917 ],

[ 80.20490626],

[ 80.30048201],

[ 80.37916636],

[ 79.9793115 ],

[ 80.33445582],

[ 80.25016875],

[ 80.28719736],

[ 80.08509193],

[ 80.02731582],

[ 80.11145704],

[ 80.16450468]])

In [315]: testData_out_temp=testData[[’Temp_1’]]

testData_out_temp=testData_out_temp.as_matrix(columns=None)

We then store the actual and predicted values in a pandas dataframe, and assign the timestamp indexof the traindata to this dataframe.

In [316]: o1_predcomp_temp=pd.DataFrame({’Predicted_Temp’:temp_predvect_o1[:,0],’Actual_Temp’:testData_out_temp[:,0]}) #trainData_out_temp

o1_predcomp_temp.head()


0 79.70 79.346904

1 79.52 79.337917

2 81.86 79.331349

3 79.70 79.323313

4 79.52 79.319683

In [317]: o1_predcomp_temp.index=testData.index



2016-05-02 00:00:00 79.70 79.346904

2016-05-02 00:20:00 79.52 79.337917

2016-05-02 00:30:00 81.86 79.331349

2016-05-02 00:40:00 79.70 79.323313

2016-05-02 00:50:00 79.52 79.319683

We then use the linalg.norm function to find the norm between the two columns.

In [318]: numpy.linalg.norm(o1_predcomp_temp.Actual_Temp-o1_predcomp_temp.Predicted_Temp)

Out[318]: 8.3417961142492878

31

http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linalg.norm.html

Now, we plot the two variables in the dataframe to visualize and compare the actual vs predicted valuesfor temperature in that time span. Clearly, there are significant issues with the accuracy of prediction. Thiscould be due to either one of two factors:

Error Source 1:The model does not allow for sufficient flexibility, in terms of allowing for higher order relationships

between predictor and response variables. A component of error is likely due to this, given the observednon-linear relationship between individual predictors and response variables that were visualized earlier.

Error Source 2:It is possible that the subset of predictors chosen is not an exhaustive set of factors influencing internal

temperature. We know for a fact that internal HVAC data from Hunt Library is likely an important factorin determining the final internal temperature. This data has been difficult to obtain, and therefore, is notfactored into the model.

In [426]: plot_15=o1_predcomp_temp.plot(figsize=[20,8])

plot_15.set_ylabel(’Internal Temperature’, color=’b’)

plot_15.set_xlabel(’Time Series’, color=’b’)

Out[426]: <matplotlib.text.Text at 0x1197a9ad0>

Matrix Multiplication (To Understand Underlying Mechanism) We now perform a matrixmultiplication to verify the output of the linalg.lstsq() function used earlier. This computation uses methodsthat allow us to inverse (linalg.inv), transpose (np.transpose()), and calculate dot products (np.dot) toevaluate the closed form solution for coefficients specified at the start of this section. We observe that thecoefficients calculated using this method are identical to the coefficients calculated by the linalg.lstsq method.

In [320]: a_mult_temp=np.dot(np.dot(linalg.inv(np.dot(trainData_pred.transpose(),\

trainData_pred)),trainData_pred.transpose())\

,trainData_out_temp)

#Computes coefficient vector

In [321]: a_mult_temp

Out[321]: array([[ 7.78318038e+01],

[ 5.28436528e-04],

[ 3.74469693e-02],

[ -8.06533948e-03],

[ 6.81049727e-02]])

32

http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linalg.inv.html

http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.transpose.html

http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.dot.html

Humidity Tests



We now carry out identical tests for humidity, and observe a similar fit, with considerable loss over thepredictions. This strengthens our belief in the need for a higher order representation, in addition to the needto get more expressive set of predictor variables, especially for temperature.

In [323]: trainData_out_hum=trainData[[’Hum_1’]]

trainData_out_hum.head()

Out[323]: Hum 1

2016-04-22 23:30:00 36.2

2016-04-22 23:40:00 36.4

2016-04-22 23:50:00 36.4

2016-04-23 00:00:00 36.3

2016-04-23 00:10:00 36.2

In [324]: trainData_out_hum=trainData_out_hum.as_matrix(columns=None)

trainData_out_hum[:10]

Out[324]: array([[ 36.2],

[ 36.4],

[ 36.4],

[ 36.3],

[ 36.2],

[ 35.8],

[ 36. ],

[ 35.5],

[ 34.4],

[ 34.1]])

First Order Multivariate Regression

LstSq Function

In [328]: a_funct_hum_o1 = np.linalg.lstsq(trainData_pred, trainData_out_hum)

In [329]: a_funct_hum_o1

Out[329]: (array([[ -1.21793925e+01],

[ -2.72344512e-03],

[ 4.22477246e-01],

[ 2.46070242e-01],

[ 2.18038405e-01]]),

array([ 2445.52601234]),

5,

array([ 1.02197701e+04, 2.65591549e+03, 4.77501272e+02,

4.75429041e+01, 2.99992583e+00]))

In [330]: hum_predvect_o1=np.dot(testData_pred,a_funct_hum_o1[0])

#Prediction basing on estimated params

hum_predvect_o1

33

Out[330]: array([[ 35.8982968 ],

[ 36.01599474],

[ 36.0598716 ],

[ 36.07032108],

[ 36.18105269],

[ 36.2337499 ],

[ 36.2741436 ],

[ 36.31105413],

[ 36.23537904],

[ 36.1631871 ],

[ 36.17363658],

[ 36.32372946],

[ 36.25298075],

[ 36.2265197 ],

[ 36.20354181],

[ 36.29314956],

[ 36.40054228],

[ 36.36140855],

[ 36.26953598],

[ 36.30992968],

[ 36.3345367 ],

[ 36.28695179],

[ 36.37841356],

[ 36.36077286],

[ 36.45223464],

[ 37.6666552 ],

[ 36.6973361 ],

[ 37.06893118],

[ 36.65601817],

[ 36.63011219],

[ 36.52245295],

[ 36.50132908],

[ 36.65327599],

[ 36.40253182],

[ 36.3371203 ],

[ 36.33916024],

[ 36.41339211],

[ 36.22120922],

[ 36.26687841],

[ 36.35072384],

[ 36.47675531],

[ 36.69623399],

[ 36.59827206],

[ 36.68081074],

[ 36.68982335],

[ 36.76854953],

[ 37.29483945],

[ 37.30345408],

[ 37.56283268],

[ 37.43793681],

[ 37.35426826],

[ 36.97117162],

[ 36.42595891],

[ 37.17682566],

34

[ 35.9222613 ],

[ 36.5975405 ],

[ 35.67540604],

[ 35.58742006],

[ 35.44313637],

[ 35.68849076],

[ 35.81800849],

[ 34.64394173],

[ 35.68790225],

[ 36.37173261],

[ 35.52839523],

[ 35.09203774],

[ 35.021855 ],

[ 35.80800337],

[ 35.90272462],

[ 34.33205902],

[ 34.99478911],

[ 34.56428782],

[ 34.70258417],

[ 34.75613097],

[ 33.23511784],

[ 33.57388606],

[ 34.11157751],

[ 34.25398761],

[ 33.78010773],

[ 33.77699189],

[ 33.52601006],

[ 33.92613641],

[ 33.29603064],

[ 33.18268712],

[ 33.06934361],

[ 32.00373588],

[ 32.89558512],

[ 32.84889912],

[ 32.85205531],

[ 33.46853075],

[ 33.12936645],

[ 33.06008697],

[ 33.03209316],

[ 33.27621492],

[ 32.99408562],

[ 32.4529855 ],

[ 32.25637029],

[ 32.24287682],

[ 32.06436855],

[ 31.73694132],

[ 32.32331589],

[ 32.70147578],

[ 32.23721654],

[ 32.17112234],

[ 31.99040091],

[ 31.86701772],

[ 32.09240158],

[ 31.95585968],

35

[ 31.80509085],

[ 31.80394936],

[ 31.76007962],

[ 32.01134327],

[ 32.04452487],

[ 31.74963094],

[ 31.82624352],

[ 31.83339576],

[ 31.79056193],

[ 32.11193899],

[ 32.37576866],

[ 32.45396237],

[ 32.378143 ],

[ 32.32400256],

[ 32.37354414],

[ 32.2464318 ],

[ 32.41022978],

[ 32.31124927],

[ 32.46474204],

[ 32.42657418],

[ 32.38391566],

[ 32.37365208],

[ 32.1761602 ],

[ 32.20035642],

[ 32.18968205],

[ 32.08898912],

[ 32.02384678],

[ 32.2446051 ],

[ 32.30115463],

[ 31.98907882],

[ 31.78076828],

[ 31.74000542],

[ 31.87223006],

[ 31.70124086],

[ 31.64668962],

[ 31.57594091],

[ 31.69794361],

[ 31.75055754],

[ 31.69233989],

[ 31.7889556 ],

[ 31.75548659],

[ 31.71672202],

[ 31.8101821 ],

[ 31.72179269],

[ 31.68669719],

[ 31.56488041],

[ 31.61817433],

[ 31.6536 ],

[ 31.53675124],

[ 31.45967455],

[ 31.46474521],

[ 31.25443638],

[ 31.56706727],

[ 31.07888996],

36

[ 31.25884378],

[ 31.16701284],

[ 31.06287839],

[ 30.95870231],

[ 30.93183046],

[ 31.12208949],

[ 31.10634445],

[ 31.29163546],

[ 31.19446734],

[ 31.17826986],

[ 31.07269219],

[ 31.10282231],

[ 31.19234678],

[ 31.22740329],

[ 31.07572557],

[ 31.10381575],

[ 31.26946776],

[ 31.29059162],

[ 31.29396508],

[ 31.45209147],

[ 31.20229269],

[ 31.0864151 ],

[ 31.09275122],

[ 31.05300836],

[ 30.94988206],

[ 30.93932287],

[ 30.93233933],

[ 31.04396869],

[ 30.88530282],

[ 30.95438699],

[ 30.6273868 ],

[ 30.6497562 ],

[ 30.58765663],

[ 30.65019166],

[ 30.68220282],

[ 30.25538343],

[ 30.4411413 ],

[ 30.44303613],

[ 30.38813598],

[ 30.34602687],

[ 30.04988068],

[ 29.99602766],

[ 29.63268318],

[ 29.1394627 ],

[ 29.26489642],

[ 28.82974692],

[ 28.66439441],

[ 29.15105908],

[ 28.44367351],

[ 28.01067542],

[ 28.01330285],

[ 28.01620262],

[ 28.01883005],

[ 28.00915396],

37

[ 28.01191756],

[ 28.03580502],

[ 28.05815472],

[ 28.07168407],

[ 28.07458384],

[ 28.07721127],

[ 28.06753518],

[ 28.07016261],

[ 28.07306238],

[ 28.06338629],

[ 28.06601372],

[ 28.06877732],

[ 28.07154092],

[ 28.06186484],

[ 28.05508852],

[ 28.06861787],

[ 28.08214722],

[ 28.08491082],

[ 28.0753709 ],

[ 28.07799833],

[ 28.08062576],

[ 28.11346974],

[ 28.11609717],

[ 28.11872459],

[ 28.12148819],

[ 28.11194828],

[ 28.11457571],

[ 28.11720314],

[ 28.1077994 ],

[ 28.11042682],

[ 28.13485809],

[ 28.13748552],

[ 28.12808178],

[ 28.1307092 ],

[ 28.13333663],

[ 28.12379672],

[ 28.12656032],

[ 28.12918774],

[ 28.13181517],

[ 28.12503886],

[ 28.16991401],

[ 28.1603741 ],

[ 28.1631377 ],

[ 28.17666704],

[ 28.19019639],

[ 28.18079265],

[ 28.18342008],

[ 28.17374399],

[ 28.17650759],

[ 28.17927119],

[ 28.18189862],

[ 28.17222253],

[ 28.1751223 ],

[ 28.17774973],

38

[ 28.16807365],

[ 28.17070107],

[ 28.17360084],

[ 28.18713019],

[ 28.18835603],

[ 28.19111963],

[ 28.21500709],

[ 28.22645487],

[ 28.22908229],

[ 28.23198206],

[ 28.23460949],

[ 27.64036106],

[ 27.17186438],

[ 27.24034826],

[ 27.06068299],

[ 26.96881041],

[ 26.95502204],

[ 26.83528683],

[ 26.82112929],

[ 26.61504194],

[ 26.66773915],

[ 26.82624158],

[ 27.14994359],

[ 27.3871545 ],

[ 27.1431934 ],

[ 27.13130069],

[ 27.30540388],

[ 27.31033027],

[ 27.42165859],

[ 27.85123502],

[ 28.71899741],

[ 28.78540238],

[ 29.32838604],

[ 30.41087022],

[ 30.46607147],

[ 30.54402572],

[ 30.53967314],

[ 30.31095701],

[ 29.93089069],

[ 29.94384421],

[ 29.98017484],

[ 29.98742337],

[ 29.827249 ],

[ 29.80704368],

[ 29.66452224],

[ 29.56205929],

[ 29.34638515],

[ 29.39613887],

[ 29.29015004],

[ 29.40599354],

[ 29.12776775],

[ 29.05389954],

[ 29.23895987],

[ 28.69967206],

39

[ 28.3167044 ],

[ 28.35016483],

[ 27.88774343],

[ 27.07317147],

[ 26.5546384 ],

[ 26.65013087],

[ 27.0527206 ],

[ 26.64722419],

[ 26.91818109],

[ 27.19733539],

[ 26.92377284],

[ 26.61832226],

[ 26.72259165],

[ 26.34426268],

[ 25.73537697],

[ 26.12418297],

[ 26.0060648 ],

[ 26.36494983],

[ 26.84787143],

[ 26.16513687],

[ 26.21230991],

[ 26.08454625],

[ 26.51306297],

[ 26.49381391],

[ 26.38958072],

[ 26.57566049]])

In [331]: testData_out_hum=testData[[’Hum_1’]]

testData_out_hum=testData_out_hum.as_matrix(columns=None)

In [332]: o1_predcomp_hum=pd.DataFrame({’Predicted_Hum’:hum_predvect_o1[:,0],\

’Actual_Hum’:testData_out_hum[:,0]})

#trainData_out_temp

o1_predcomp_hum.head()


0 37.0 35.898297

1 36.9 36.015995

2 36.7 36.059872

3 37.0 36.070321

4 36.1 36.181053

In [334]: o1_predcomp_hum.index=testData.index



2016-05-02 00:00:00 37.0 35.898297

2016-05-02 00:20:00 36.9 36.015995

2016-05-02 00:30:00 36.7 36.059872

2016-05-02 00:40:00 37.0 36.070321

2016-05-02 00:50:00 36.1 36.181053

In [335]: numpy.linalg.norm(o1_predcomp_hum.Actual_Hum-o1_predcomp_hum.Predicted_Hum)

Out[335]: 37.057462760024826

40

The following plot visualizes the actual humidity versus the prediction.

In [427]: plot_16=o1_predcomp_hum.plot(figsize=[20,8])

plot_16.set_ylabel(’Internal Humidity’, color=’b’)


Out[427]: <matplotlib.text.Text at 0x113016e50>

Second Order Multivariate Regression

We now install python’s machine learning package sklearn, and use the preprocessing library to import thePolynomialFeatures, and linear model methods that would allow us to explode first order predictor variablesto higher orders.

In [342]: from sklearn.preprocessing import PolynomialFeatures

from sklearn import linear_model



In [344]: poly = PolynomialFeatures(degree=2)

#sets order of regression

In [345]: trainData_pred[0]

#First Training Example: 1, SR, Temp, Hum, Wind Speed

Out[345]: array([ 1. , 1. , 60. , 89.25, 2.55])

In [350]: trainData_pred_o2=poly.fit_transform(trainData_pred)

testData_pred_o2=poly.fit_transform(testData_pred)

#Transforming traindata to specified order

In [347]: trainData_pred_o2[0]

#First Training example of transformed sample in the second order (Training)

41

http://scikit-learn.org/stable/

http://scikit-learn.org/stable/modules/preprocessing.html

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

http://scikit-learn.org/stable/modules/linear_model.html

Out[347]: array([ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00,

6.00000000e+01, 8.92500000e+01, 2.55000000e+00,

1.00000000e+00, 1.00000000e+00, 6.00000000e+01,

8.92500000e+01, 2.55000000e+00, 1.00000000e+00,

6.00000000e+01, 8.92500000e+01, 2.55000000e+00,

3.60000000e+03, 5.35500000e+03, 1.53000000e+02,

7.96556250e+03, 2.27587500e+02, 6.50250000e+00])

In [351]: testData_pred_o2[0]

#First Training example of transformed sample in the second order (Testing)

Out[351]: array([ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00,

6.02500000e+01, 9.19500000e+01, 0.00000000e+00,

1.00000000e+00, 1.00000000e+00, 6.02500000e+01,

9.19500000e+01, 0.00000000e+00, 1.00000000e+00,

6.02500000e+01, 9.19500000e+01, 0.00000000e+00,

3.63006250e+03, 5.53998750e+03, 0.00000000e+00,

8.45480250e+03, 0.00000000e+00, 0.00000000e+00])

The exploded space, therefore, consists of 21 predictor variables, thus allowing for greater model flexibility.We would, therefore, expect a better fit with higher order regression than the results from the first ordermultivariate regression.

Temperature We now follow a method similar to the one employed for first order regression to carryout second order regression.

In [348]: a_funct_temp_o2 = np.linalg.lstsq(trainData_pred_o2, trainData_out_temp)

#Calculating coefficient vector, loss, rank, and blah

In [349]: a_funct_temp_o2

Out[349]: (array([[ 2.32951991e+01],

[ 2.32951991e+01],

[ 1.56588442e-03],

[ 7.47454034e-02],

[ 6.48933406e-02],

[ 1.18408427e-01],

[ 2.32951991e+01],

[ 1.56588416e-03],

[ 7.47454033e-02],

[ 6.48933404e-02],

[ 1.18408427e-01],

[ -9.68622210e-07],

[ -1.85052626e-05],

[ -2.54900448e-05],

[ 3.03122179e-04],

[ -2.40441190e-04],

[ -1.18427276e-03],

[ -2.83913106e-03],

[ -4.88623051e-04],

[ -2.26885555e-04],

[ -1.45039047e-02]]),

array([], dtype=float64),

15,

array([ 7.45986678e+06, 3.55188748e+05, 1.95361515e+05,

42

1.20910776e+05, 4.58000766e+04, 1.77848563e+04,

6.44400964e+03, 3.52987078e+03, 1.11685798e+03,

6.59266099e+02, 3.16040890e+02, 8.76511721e+01,

4.58666434e+01, 5.27359788e+00, 5.69081560e-01,

6.44149164e-11, 3.62070457e-11, 2.46047174e-12,

3.79561191e-13, 7.49638951e-14, 3.40372875e-14]))

In [352]: temp_predvect_o2=np.dot(testData_pred_o2,a_funct_temp_o2[0])


temp_predvect_o2

Out[352]: array([[ 79.26109153],

[ 79.23935727],

[ 79.22706007],

[ 79.21582958],

[ 79.20119341],

[ 79.19085813],

[ 79.18212863],

[ 79.17714307],

[ 79.17745505],

[ 79.17399155],

[ 79.16276343],

[ 79.17690569],

[ 79.16082455],

[ 79.15471336],

[ 79.14493769],

[ 79.1295568 ],

[ 79.12685164],

[ 79.09121619],

[ 79.07152168],

[ 79.06268516],

[ 79.05926332],

[ 79.05292004],

[ 79.04663989],

[ 79.04262294],

[ 79.03624863],

[ 78.78605754],

[ 79.02540096],

[ 79.01354128],

[ 78.98201161],

[ 79.00816682],

[ 79.00069438],

[ 79.00062309],

[ 79.01292095],

[ 78.98031297],

[ 78.96549734],

[ 78.97986125],

[ 78.99405245],

[ 78.9670362 ],

[ 78.96454757],

[ 78.96479425],

[ 78.96614383],

[ 78.99380842],

[ 78.93016489],

[ 78.90415922],

43

[ 78.82333275],

[ 78.87148624],

[ 79.00924227],

[ 79.02355831],

[ 79.06008207],

[ 79.06949617],

[ 79.0980048 ],

[ 79.12365736],

[ 79.01911686],

[ 79.30277287],

[ 79.55835644],

[ 79.47596804],

[ 79.65738702],

[ 79.72570794],

[ 79.65909038],

[ 79.70762525],

[ 79.6795497 ],

[ 79.76455661],

[ 79.73191072],

[ 79.5274163 ],

[ 79.79387245],

[ 80.11141039],

[ 80.10033204],

[ 79.7247873 ],

[ 79.56165484],

[ 80.23912274],

[ 80.06197735],

[ 79.97552007],

[ 79.99499562],

[ 80.03134109],

[ 80.43693908],

[ 80.26313801],

[ 80.13030897],

[ 79.96030285],

[ 80.16366845],

[ 80.07893123],

[ 80.15284638],

[ 80.05309941],

[ 80.17926149],

[ 80.30565395],

[ 80.43409699],

[ 80.46142734],

[ 80.3626625 ],

[ 80.29619912],

[ 80.20685645],

[ 79.99018969],

[ 80.14903156],

[ 80.0923221 ],

[ 80.05093483],

[ 79.86578821],

[ 79.95180016],

[ 80.08589886],

[ 80.09202413],

[ 80.20087533],

44

[ 80.34813117],

[ 80.32491874],

[ 80.12088524],

[ 79.80086194],

[ 79.98145812],

[ 79.99482688],

[ 79.9864941 ],

[ 79.93211427],

[ 79.87800131],

[ 79.85882973],

[ 79.82686637],

[ 79.82793635],

[ 79.8021976 ],

[ 79.75848954],

[ 79.74271104],

[ 79.71871188],

[ 79.71127689],

[ 79.69757763],

[ 79.61953547],

[ 79.55368084],

[ 79.51694512],

[ 79.50413826],

[ 79.48961162],

[ 79.51048849],

[ 79.53839899],

[ 79.50846745],

[ 79.51693288],

[ 79.47238283],

[ 79.49446678],

[ 79.49115076],

[ 79.48062862],

[ 79.48000001],

[ 79.46798024],

[ 79.45457554],

[ 79.44515963],

[ 79.43029371],

[ 79.40386801],

[ 79.43261256],

[ 79.42761765],

[ 79.42188076],

[ 79.39152056],

[ 79.36561653],

[ 79.39010678],

[ 79.36051514],

[ 79.3591428 ],

[ 79.33649049],

[ 79.36758544],

[ 79.3787324 ],

[ 79.35486537],

[ 79.33039274],

[ 79.34032933],

[ 79.3356268 ],

[ 79.34869454],

[ 79.33146635],

45

[ 79.32397755],

[ 79.30631086],

[ 79.31494732],

[ 79.3188859 ],

[ 79.29456379],

[ 79.29525658],

[ 79.28911181],

[ 79.22487952],

[ 79.29198063],

[ 79.24706401],

[ 79.2852689 ],

[ 79.26360997],

[ 79.23388456],

[ 79.21120348],

[ 79.17808234],

[ 79.21201984],

[ 79.20115617],

[ 79.25422017],

[ 79.22296419],

[ 79.20335655],

[ 79.18426118],

[ 79.19576436],

[ 79.21094674],

[ 79.20164434],

[ 79.14831966],

[ 79.14394657],

[ 79.17584333],

[ 79.17661474],

[ 79.17858397],

[ 79.21956417],

[ 79.15694061],

[ 79.135144 ],

[ 79.10696226],

[ 79.13712371],

[ 79.11832785],

[ 79.13082007],

[ 79.16990762],

[ 79.26594357],

[ 79.20633508],

[ 79.17997123],

[ 79.19056313],

[ 79.24559072],

[ 79.2991557 ],

[ 79.32951848],

[ 79.37792683],

[ 79.39009448],

[ 79.37335895],

[ 79.32990797],

[ 79.36737757],

[ 79.47698453],

[ 79.31724606],

[ 79.40466511],

[ 79.36099632],

[ 79.20698687],

46

[ 79.37380626],

[ 79.35396693],

[ 79.35145827],

[ 79.59395697],

[ 79.36311388],

[ 79.43857128],

[ 79.45080374],

[ 79.46287398],

[ 79.4745953 ],

[ 79.48733725],

[ 79.49857475],

[ 79.51074651],

[ 79.53558149],

[ 79.55735569],

[ 79.56731596],

[ 79.57694579],

[ 79.58747474],

[ 79.59657798],

[ 79.6054916 ],

[ 79.61518184],

[ 79.62350183],

[ 79.6315964 ],

[ 79.63943431],

[ 79.64801372],

[ 79.66336019],

[ 79.67956461],

[ 79.69514119],

[ 79.70110459],

[ 79.70772658],

[ 79.71314147],

[ 79.71830188],

[ 79.72709709],

[ 79.73175908],

[ 79.73616658],

[ 79.74033162],

[ 79.74503261],

[ 79.74865686],

[ 79.75202663],

[ 79.75589419],

[ 79.75873518],

[ 79.77531223],

[ 79.77734342],

[ 79.77978848],

[ 79.78129089],

[ 79.78253883],

[ 79.7841403 ],

[ 79.7848576 ],

[ 79.78532229],

[ 79.78553249],

[ 79.78569379],

[ 79.78872396],

[ 79.78840443],

[ 79.78732337],

[ 79.79075994],

47

[ 79.79356865],

[ 79.79181705],

[ 79.78940698],

[ 79.78711987],

[ 79.78416431],

[ 79.78095208],

[ 79.77750428],

[ 79.77410634],

[ 79.77008522],

[ 79.76585417],

[ 79.76161751],

[ 79.75685988],

[ 79.75179211],

[ 79.74903105],

[ 79.74581903],

[ 79.73969011],

[ 79.73539773],

[ 79.73104008],

[ 79.72419096],

[ 79.71701322],

[ 79.70965295],

[ 79.69292153],

[ 79.64165821],

[ 79.63763914],

[ 79.60936788],

[ 79.58500525],

[ 79.60513488],

[ 79.57860507],

[ 79.57192996],

[ 79.54783203],

[ 79.54629962],

[ 79.56114733],

[ 79.58982235],

[ 79.57081399],

[ 79.49032151],

[ 79.51132877],

[ 79.51807422],

[ 79.50144976],

[ 79.52572317],

[ 79.46713954],

[ 79.41465323],

[ 79.38522414],

[ 79.35796392],

[ 79.29057735],

[ 79.29441399],

[ 79.2894801 ],

[ 79.29945966],

[ 79.30419476],

[ 79.30191289],

[ 79.30330844],

[ 79.30997796],

[ 79.32445166],

[ 79.33891301],

[ 79.36297942],

48

[ 79.37260667],

[ 79.38661781],

[ 79.36348905],

[ 79.42324434],

[ 79.3744849 ],

[ 79.41341231],

[ 79.3933769 ],

[ 79.47642139],

[ 79.5247705 ],

[ 79.52291583],

[ 79.50708902],

[ 79.72083884],

[ 79.67310478],

[ 79.56155247],

[ 79.60537555],

[ 79.86037116],

[ 79.95368936],

[ 79.85797184],

[ 79.86796146],

[ 79.83841639],

[ 79.90992159],

[ 79.81168094],

[ 79.93210744],

[ 79.84700995],

[ 80.08287099],

[ 80.21543579],

[ 80.22568745],

[ 80.45177154],

[ 80.0553673 ],

[ 80.46212542],

[ 80.20608625],

[ 80.26359789],

[ 80.24088163],

[ 80.13932885],

[ 80.06719099],

[ 80.34040847]])

In [353]: temp_predvect_o2.size

#Sanity Check

Out[353]: 350

In [354]: trainData_out_temp.size

#Sanity Check

Out[354]: 1080

In [355]: o2_predcomp_temp=pd.DataFrame({’Predicted_Temp’:temp_predvect_o2[:,0],\

’Actual_Temp’:testData_out_temp[:,0]})

#Importing to a pandas dataframe



0 79.70 79.261092

1 79.52 79.239357

49

2 81.86 79.227060

3 79.70 79.215830

4 79.52 79.201193

In [357]: o2_predcomp_temp.index=testData.index



2016-05-02 00:00:00 79.70 79.261092

2016-05-02 00:20:00 79.52 79.239357

2016-05-02 00:30:00 81.86 79.227060

2016-05-02 00:40:00 79.70 79.215830

2016-05-02 00:50:00 79.52 79.201193

Below, we plot the timeseries of predicted vs actual temperature. An immediate takeaway from thisvisualization is that the fit is more accurate than the one observed in the first order.

In [428]: plot_17=o2_predcomp_temp.plot(figsize=[20,8])

plot_17.set_ylabel(’Internal Temperature’, color=’b’)


Out[428]: <matplotlib.text.Text at 0x115fc59d0>

Humidity We now repeat the process for the second order multivariate regression for humidity.

In [360]: a_funct_hum_o2 = np.linalg.lstsq(trainData_pred_o2, \

trainData_out_hum)

#Calculating coefficient vector, loss, rank, and blah

In [361]: a_funct_hum_o2

Out[361]: (array([[ 3.01610187e+00],

[ 3.01610187e+00],

[ -4.39954966e-03],

[ -1.39898784e-02],

[ -1.88188131e-02],

[ 9.34283503e-01],

[ 3.01610187e+00],

50

[ -4.39954970e-03],

[ -1.39898784e-02],

[ -1.88188131e-02],

[ 9.34283503e-01],

[ 4.44596105e-06],

[ 4.21982402e-05],

[ 1.16691276e-05],

[ -7.32551283e-05],

[ 1.04690212e-03],

[ 5.49603421e-03],

[ -1.12300784e-02],

[ -1.41626278e-04],

[ -8.73702840e-03],

[ -6.46861709e-02]]),

array([], dtype=float64),

15,

array([ 7.45986678e+06, 3.55188748e+05, 1.95361515e+05,

1.20910776e+05, 4.58000766e+04, 1.77848563e+04,

6.44400964e+03, 3.52987078e+03, 1.11685798e+03,

6.59266099e+02, 3.16040890e+02, 8.76511721e+01,

4.58666434e+01, 5.27359788e+00, 5.69081560e-01,

6.44149164e-11, 3.62070457e-11, 2.46047174e-12,

3.79561191e-13, 7.49638951e-14, 3.40372875e-14]))

In [362]: hum_predvect_o2=np.dot(testData_pred_o2,a_funct_hum_o2[0])


hum_predvect_o2

Out[362]: array([[ 36.94742507],

[ 37.06051328],

[ 37.09297625],

[ 37.08126674],

[ 37.200566 ],

[ 37.24936416],

[ 37.28476052],

[ 37.32440838],

[ 37.21884328],

[ 37.10890248],

[ 37.09507353],

[ 37.29980235],

[ 37.18224877],

[ 37.12852365],

[ 37.06934887],

[ 37.15338781],

[ 37.23877887],

[ 37.14027356],

[ 36.98012214],

[ 37.01220851],

[ 37.0375901 ],

[ 36.94963994],

[ 37.06327434],

[ 37.02542519],

[ 37.13920748],

[ 36.66322381],

[ 37.18824212],

51

[ 37.63325982],

[ 37.22514343],

[ 37.13098688],

[ 36.99931151],

[ 36.96778264],

[ 37.12554567],

[ 36.84268285],

[ 36.74075407],

[ 36.74718548],

[ 36.8576679 ],

[ 36.57527356],

[ 36.63747832],

[ 36.76127063],

[ 36.95512298],

[ 37.22800239],

[ 37.08396257],

[ 37.21558138],

[ 37.32798996],

[ 37.43898509],

[ 38.13832119],

[ 38.18754014],

[ 38.43680711],

[ 38.29158363],

[ 38.24648004],

[ 37.70068733],

[ 37.6890759 ],

[ 38.23852237],

[ 36.24636167],

[ 37.22488093],

[ 36.18993785],

[ 36.07113121],

[ 36.67667614],

[ 36.13431599],

[ 36.94408314],

[ 35.90210166],

[ 36.76720858],

[ 35.65486806],

[ 36.31789059],

[ 34.56262677],

[ 35.04053705],

[ 35.86590409],

[ 34.63672815],

[ 34.50639377],

[ 35.30404882],

[ 35.87867822],

[ 35.90795385],

[ 35.05317725],

[ 34.88207164],

[ 35.31208764],

[ 34.87814455],

[ 34.07515819],

[ 34.21674803],

[ 34.41140419],

[ 33.97190553],

52

[ 34.87006642],

[ 34.07124904],

[ 33.66164081],

[ 33.13398557],

[ 34.18269706],

[ 34.01745645],

[ 33.57423476],

[ 33.56923775],

[ 33.73823863],

[ 33.41848885],

[ 33.79087662],

[ 33.84794439],

[ 33.60594453],

[ 33.43447777],

[ 33.04869539],

[ 32.93461534],

[ 32.61737369],

[ 32.45079974],

[ 32.4427548 ],

[ 32.76847465],

[ 32.81692624],

[ 32.79501637],

[ 32.62623656],

[ 32.59756998],

[ 32.70245473],

[ 32.85499103],

[ 32.84601459],

[ 32.73302901],

[ 32.68450074],

[ 32.67546657],

[ 33.00458141],

[ 33.05679948],

[ 32.60054536],

[ 32.72799082],

[ 32.7624013 ],

[ 32.47100198],

[ 32.70324715],

[ 33.00029885],

[ 33.11814471],

[ 32.92994073],

[ 32.94649992],

[ 33.14082963],

[ 32.88534001],

[ 33.14234777],

[ 32.86271011],

[ 33.17565862],

[ 33.14094359],

[ 33.06226189],

[ 33.05433673],

[ 32.79685589],

[ 32.78057734],

[ 32.74434628],

[ 32.57857061],

[ 32.40550492],

53

[ 32.78204778],

[ 32.82516927],

[ 32.46105924],

[ 32.1095003 ],

[ 31.95456472],

[ 32.20977095],

[ 31.89607712],

[ 31.82962505],

[ 31.66005611],

[ 31.93286739],

[ 32.04826017],

[ 31.9001194 ],

[ 31.92569857],

[ 31.92618803],

[ 31.86655378],

[ 32.02275311],

[ 31.86186185],

[ 31.79414182],

[ 31.59306959],

[ 31.67627217],

[ 31.71600073],

[ 31.51049409],

[ 31.42466805],

[ 31.39656041],

[ 30.93582844],

[ 31.51235181],

[ 30.81866832],

[ 31.14890098],

[ 30.97226316],

[ 30.74715358],

[ 30.54331185],

[ 30.38452734],

[ 30.72479318],

[ 30.67062724],

[ 31.06111962],

[ 30.84058162],

[ 30.75211621],

[ 30.56088071],

[ 30.62940524],

[ 30.76745644],

[ 30.76339368],

[ 30.40443464],

[ 30.41687608],

[ 30.71017711],

[ 30.73850647],

[ 30.74540579],

[ 30.98425961],

[ 30.50231405],

[ 30.30587165],

[ 30.21589172],

[ 30.22986926],

[ 30.05601929],

[ 30.09544616],

[ 30.17837581],

54

[ 30.44082933],

[ 30.15626898],

[ 30.25482347],

[ 29.89334156],

[ 30.00336374],

[ 30.06369182],

[ 30.27579956],

[ 30.33468662],

[ 29.81265224],

[ 30.07814424],

[ 30.05181979],

[ 30.06826494],

[ 30.00362681],

[ 29.53423549],

[ 29.62301269],

[ 29.22277132],

[ 28.60856155],

[ 28.97832946],

[ 28.62736151],

[ 28.7934091 ],

[ 29.26901184],

[ 29.33528347],

[ 28.83991747],

[ 28.80624227],

[ 28.77341583],

[ 28.74156904],

[ 28.69753454],

[ 28.66749467],

[ 28.66357442],

[ 28.67099258],

[ 28.66706819],

[ 28.64076362],

[ 28.61537503],

[ 28.57784458],

[ 28.55427981],

[ 28.53166363],

[ 28.4968883 ],

[ 28.47606605],

[ 28.45618652],

[ 28.43722572],

[ 28.40611964],

[ 28.35977777],

[ 28.36862295],

[ 28.37821502],

[ 28.3649229 ],

[ 28.33959647],

[ 28.3280721 ],

[ 28.31745743],

[ 28.34294652],

[ 28.33406751],

[ 28.3260982 ],

[ 28.31913513],

[ 28.3001399 ],

[ 28.29491311],

55

[ 28.29059602],

[ 28.27447962],

[ 28.27199535],

[ 28.31964316],

[ 28.31913892],

[ 28.30693298],

[ 28.30826156],

[ 28.31049984],

[ 28.30092948],

[ 28.3051473 ],

[ 28.3101281 ],

[ 28.3160186 ],

[ 28.3180237 ],

[ 28.37245082],

[ 28.36921166],

[ 28.37975853],

[ 28.41632001],

[ 28.45362838],

[ 28.45447788],

[ 28.4686594 ],

[ 28.47095663],

[ 28.48717218],

[ 28.50430646],

[ 28.5221402 ],

[ 28.52810668],

[ 28.54822086],

[ 28.56879713],

[ 28.57751873],

[ 28.5999188 ],

[ 28.62372139],

[ 28.67379933],

[ 28.7119205 ],

[ 28.73839377],

[ 28.78752308],

[ 28.82449126],

[ 28.85334954],

[ 28.883674 ],

[ 28.91436069],

[ 28.40427458],

[ 27.53860714],

[ 27.61175552],

[ 27.43177535],

[ 27.20393171],

[ 27.42970503],

[ 27.20414728],

[ 27.18985689],

[ 26.84308963],

[ 26.89690263],

[ 27.19426835],

[ 27.82129411],

[ 28.12585732],

[ 27.34931848],

[ 27.4730532 ],

[ 27.64830809],

56

[ 27.52311711],

[ 27.87445943],

[ 28.0197126 ],

[ 28.78889207],

[ 28.79015217],

[ 29.27083651],

[ 30.21482415],

[ 30.27934272],

[ 30.34136539],

[ 30.36562434],

[ 30.11304281],

[ 29.67846627],

[ 29.68887111],

[ 29.72686336],

[ 29.78128764],

[ 29.62712164],

[ 29.64158225],

[ 29.47487071],

[ 29.32038537],

[ 28.97358725],

[ 29.18580507],

[ 28.92757311],

[ 29.1713987 ],

[ 28.8287174 ],

[ 28.89287462],

[ 29.1641531 ],

[ 28.53554751],

[ 28.23957023],

[ 28.53430848],

[ 27.92397824],

[ 27.43470891],

[ 27.67584814],

[ 27.22307919],

[ 27.30704508],

[ 26.66309249],

[ 27.10056019],

[ 27.24171841],

[ 27.17882823],

[ 26.67499559],

[ 26.88967711],

[ 26.40210763],

[ 26.48112716],

[ 27.04700785],

[ 27.03364485],

[ 27.86796989],

[ 26.84631496],

[ 27.35860494],

[ 26.57250651],

[ 26.50467265],

[ 26.7387155 ],

[ 26.61551909],

[ 26.17752487],

[ 26.87165016]])

57

In [363]: hum_predvect_o2.size

Out[363]: 350

In [364]: o2_predcomp_hum=pd.DataFrame({’Predicted_Hum’:hum_predvect_o2[:,0],\

’Actual_Hum’:testData_out_hum[:,0]})

#trainData_out_temp



0 37.0 36.947425

1 36.9 37.060513

2 36.7 37.092976

3 37.0 37.081267

4 36.1 37.200566

In [365]: o2_predcomp_hum.index=testData.index



2016-05-02 00:00:00 37.0 36.947425

2016-05-02 00:20:00 36.9 37.060513

2016-05-02 00:30:00 36.7 37.092976

2016-05-02 00:40:00 37.0 37.081267

2016-05-02 00:50:00 36.1 37.200566

Notice, again, that the visualization below shows an improved second order fit for humidity than the onewe noticed in the first order regression.

In [429]: plot_18=o2_predcomp_hum.plot(figsize=[20,8],fontsize=12)

plt.legend(ncol=2)

plot_18.set_ylabel(’Internal Humidity’, color=’b’)


Out[429]: <matplotlib.text.Text at 0x117aea190>

58

Statistical Inference

We now lay out the framework for an ANOVA (Analysis Of Variance) test, which, given multiple possiblepredictive models could be used to judge the relative significance of a model when compared to other models.(For example, a comparison between first and higher order regression, and if one is significantly better thanthe other) We lay out a framework for carrying out an ANOVA F Test, but stop short of comparing multiplemodels using the F-Test given time consraints. Future work could include a robust ANOVA F Test tocompare between various predictive models to determine one that best fits the population of the observeddata. As mentioned previously, error can be due to either one of two reasons:

1) The lack of expressiveness in a regression model

2) Predictor variables

These together comprose of what we will refer to as regression error. The other component of error,not mentioned previously, is more vague and difficult to isolate. Therefore, we attribute it to unexplained“error”.

First Order Regression Constants We first calculate degrees of freedom. The regression degree offreedom is given by the number of input variables in the first order regreesion.

In [367]: df_reg=a_funct_temp_o1[0].size

#Regression Degree of Freedom

df_reg

Out[367]: 5

The total degrees of freedom is equal to the training set size.

In [368]: df_tot=o1_predcomp_temp.index.size

df_tot

Out[368]: 350

The degrees of freeom of error is given by the difference between total degrees of freedom and theregression degrees of freedom minus 1.

In [369]: df_err=df_tot-df_reg-1

df_err

Out[369]: 344

Temperature Now, we calculate the regression and error sum of squared losses for the temperatureregression model.

In [370]: SSerr_temp_o1=numpy.linalg.norm(o1_predcomp_temp.Actual_Temp-\

o1_predcomp_temp.Predicted_Temp)

#Sum of squared errors between predicted and actual values for temperatures (SSerr)

SSerr_temp_o1

Out[370]: 8.3417961142492878

In [371]: SSreg_temp_o1=numpy.linalg.norm(o1_predcomp_temp.Predicted_Temp\

-np.mean(o1_predcomp_temp.Predicted_Temp))

#Sum of squared errors between predicted values, and the mean of the predicted values (SSreg)

SSreg_temp_o1

Out[371]: 7.3338875775485022

59

In [372]: SStot_temp_o1=SSerr_temp_o1+SSreg_temp_o1

#Total Sum of squared errors is equal to SSreg+SSerr

SStot_temp_o1

Out[372]: 15.67568369179779

We now compute the mean of these squared errors that would give us the average squared error perdegree of freedom.

In [373]: MSreg_temp_o1=SSreg_temp_o1/df_reg

#Mean of Squared Error for Regression

MSreg_temp_o1

Out[373]: 1.4667775155097005

In [374]: MSErr_temp_o1=SSerr_temp_o1/df_err

#Also termed as regression Variance S^2.

#The square root of this gives the root mean squared error.

MSErr_temp_o1

Out[374]: 0.024249407308864209

In [375]: F_temp_o1=MSreg_temp_o1/MSErr_temp_o1

#Computing the F-Statistic to calculate the significance of the model

F_temp_o1

Out[375]: 60.487149101311424

Given the Mean squared errors for the explainable error (MSreg), and unexplainable error (MSerr), theANOVA F test seeks to calculate the F-Statistic, which is calculated by finding the ratio between MSreg andMSerr. Large values for the F-Statistic indicate large values of MSreg, which indicate that fitted values arefar from the overall mean, thus indicating that the expected response would change significantly along withthe size of the regression plane, as is the case here. Therefore, there is strong evidence for the implementationof higher order models.

Next, we compute the Rˆ2 goodness of fit, which as expected, is not ideal.

In [376]: R_Sq_temp_01=SSreg_temp_o1/SStot_temp_o1

R_Sq_temp_01

Out[376]: 0.46785120966595645

Testing Individual Slopes (Temperature) We now compute the variance-covariance matrix. Thediagonal elements of the variance-covariance matrix give us a measure of the sensitivity of the final outputto a specific input, keeping all else constant. For example, cell 2x2 in the ‘variance vect temp o1’ dataframegives us the sensitivity of the internal temperature to the external humidity, keeping all else constant. Thematrix, therefore, gives us a measure of causal inference, giving us an ordered list of the factors that theinternal temperature is most sensitive to.

In [413]: variance_vect_temp_o1=np.multiply(linalg.inv(np.dot(np.transpose(testData_pred),\

testData_pred)),MSErr_temp_o1)

variance_vect_temp_o1

Out[413]: array([[ 2.88375044e-02, 8.46857825e-08, -4.09911905e-04,

-7.83569224e-05, 6.86531017e-04],

[ 8.46857825e-08, 2.02745805e-09, -2.78077457e-08,

1.46927160e-08, -2.50346951e-08],

[ -4.09911905e-04, -2.78077457e-08, 6.87674723e-06,

60

4.59742149e-07, -1.19205705e-05],

[ -7.83569224e-05, 1.46927160e-08, 4.59742149e-07,

6.40577242e-07, -8.47686074e-07],

[ 6.86531017e-04, -2.50346951e-08, -1.19205705e-05,

-8.47686074e-07, 5.12000045e-05]])

In [411]: variance_vect_temp_o1=pd.DataFrame({’Constant’:variance_vect_temp_o1[:,0],\

’Solar_Radiation’:variance_vect_temp_o1[:,1],\

’External_Temperature’:variance_vect_temp_o1[:,2],\

’External_Humidity’:variance_vect_temp_o1[:,3],\

’Wind Speed’:variance_vect_temp_o1[:,4]})

variance_vect_temp_o1

Out[411]: Constant External Humidity External Temperature Solar Radiation \0 2.883750e-02 -7.835692e-05 -4.099119e-04 8.468578e-08

1 8.468578e-08 1.469272e-08 -2.780775e-08 2.027458e-09

2 -4.099119e-04 4.597421e-07 6.876747e-06 -2.780775e-08

3 -7.835692e-05 6.405772e-07 4.597421e-07 1.469272e-08

4 6.865310e-04 -8.476861e-07 -1.192057e-05 -2.503470e-08

Wind Speed

0 6.865310e-04

1 -2.503470e-08

2 -1.192057e-05

3 -8.476861e-07

4 5.120000e-05

As is clear from the diagonal elements of the matrix above, the internal temperature’s sensitivity to thepredictor variables is as follows:

1) Wind Speed

2) External Temperature

3) Tied between Solar Radiation / External Humidity

Humidity We carry out the same statistical inference process for the first order humidity regressionas the one used for temperature. We will, therefore, not go into specific detail of each step, but will makenote of interesting results worth taking note of.

In [390]: SSerr_hum_o1=numpy.linalg.norm(o1_predcomp_hum.Actual_Hum-\

o1_predcomp_hum.Predicted_Hum)

#SSErr

SSerr_hum_o1

Out[390]: 37.057462760024826

In [391]: SSreg_hum_o1=numpy.linalg.norm(o1_predcomp_hum.Predicted_Hum-\

np.mean(o1_predcomp_hum.Predicted_Hum))

SSreg_hum_o1

Out[391]: 60.539649317778931

In [392]: SStot_hum_o1=SSerr_hum_o1+SSreg_hum_o1

SStot_hum_o1

Out[392]: 97.597112077803757

61

In [393]: MSreg_hum_o1=SSreg_hum_o1/df_reg

MSreg_hum_o1

Out[393]: 12.107929863555785

In [394]: MSErr_hum_o1=SSerr_hum_o1/df_err

#Regression Variance S^2

MSErr_hum_o1

Out[394]: 0.10772518244193263

In [395]: F_hum_o1=MSreg_hum_o1/MSErr_hum_o1

F_hum_o1

Out[395]: 112.39646653726813

Again, notice that we arrive on a reasoably large value for the F-Statistic, thus indicating that higherorder models could potentially improve on this model.

First order humidity regression gives a better fit than the first order temperature regression fit.

In [396]: R_Sq_hum_01=SSreg_hum_o1/SStot_hum_o1

R_Sq_hum_01

Out[396]: 0.62030164652328168

Testing Individual Slopes (Humidity)

In [407]: testData_pred=testData_pred.as_matrix(columns=None)

In [408]: variance_vect_hum_o1=np.multiply(linalg.inv(np.dot(np.transpose(testData_pred)\

,testData_pred)),MSErr_hum_o1)

variance_vect_hum_o1

Out[408]: array([[ 1.28107272e-01, 3.76206777e-07, -1.82098615e-03,

-3.48091549e-04, 3.04983450e-03],

[ 3.76206777e-07, 9.00674749e-09, -1.23532688e-07,

6.52706883e-08, -1.11213733e-07],

[ -1.82098615e-03, -1.23532688e-07, 3.05491528e-05,

2.04235123e-06, -5.29557533e-05],

[ -3.48091549e-04, 6.52706883e-08, 2.04235123e-06,

2.84569018e-06, -3.76574717e-06],

[ 3.04983450e-03, -1.11213733e-07, -5.29557533e-05,

-3.76574717e-06, 2.27450088e-04]])

In [409]: variance_vect_hum_o1=pd.DataFrame({’Constant’:variance_vect_hum_o1[:,0],\

’Solar_Radiation’:variance_vect_hum_o1[:,1],\

’Temperature’:variance_vect_hum_o1[:,2],\

’Humidity’:variance_vect_hum_o1[:,3],\

’Wind Speed’:variance_vect_hum_o1[:,4]})

variance_vect_hum_o1

Out[409]: Constant Humidity Solar Radiation Temperature Wind Speed

0 0.128107 -3.480915e-04 3.762068e-07 -0.001821 0.003050

1 0.000000 6.527069e-08 9.006747e-09 -0.000000 -0.000000

2 -0.001821 2.042351e-06 -1.235327e-07 0.000031 -0.000053

3 -0.000348 2.845690e-06 6.527069e-08 0.000002 -0.000004

4 0.003050 -3.765747e-06 -1.112137e-07 -0.000053 0.000227

62

As is clear from the diagonal elements of the matrix above, the internal humidity’s sensitivity to thepredictor variables is as follows. Correlation is positive unless indicated otherwise:

1) External Humidity

2) Solar Radiation (Negative Correlation)

3) Wind Speed

4) External Temperature

63

final report - heating and cooling

Documents