final report - heating and cooling
TRANSCRIPT
Hunt Library Heating & Cooling 12745: AIS Project, Final Report
Aditya Chaganti, Alex Shu, Joseph Dryer, Yutong Guo 4/6/16
Executive Summary
A solution is only valuable if there was a problem to begin with. This may sound exceedingly simple, yet
many start-up companies and other businesses fail because they address a problem or market that simply
doesn’t exist. Our project has its origins in a complaint here and there about the indoor conditions at Hunt
Library. Our professor mentioned that some employees who work there keep heaters under their desks.
Some team members who study there started to recall being too hot or too cold often. User interviews
revealed it wasn’t just this – throughout the building, various employees were happy to explain to us the
thermal imbalance they experience and its negative effects. Thus, out of a variety of different problems
that we could solve in a semester, we decided Hunt Library was the choice – a real problem that affected
real people.
The heating and cooling of a building is complex, governed by a variety of natural phenomena like
outdoor temperature, wind speed, irradiation, as well as indoor factors such as occupancy, lighting,
computers, etc. Mechanical systems must add or remove heat from the building to balance these forces
and produce a comfortable environment. Our project doesn’t solve the problem in Hunt Library – we do
not suggest that valve C on air handling unit #1 be turned 15 degrees clockwise. Rather, we have produced
a method for assessing the problem and providing the information needed to mitigate it.
We designed and built a scalable sensor network that measures temperature and humidity at any
desired resolution via mobile, battery powered sensor nodes. We produced a data management system
that collects these measurements and stores them for later use. We created an analysis framework that
leverages our network measurements as well as mechanical systems data to identify driving factors for
internal conditions and also predicts them. Communication of results is achieved through our visualization
design – although not implemented, we have conceptually designed an application that will deliver real
time and historical data of indoor conditions and data analysis outcomes through intelligent graphics.
Through our system, a building manager can make data driven decisions to keep occupants
comfortable and increase energy efficiency. An office worker can predict indoor conditions and come to
work dressed appropriately.
While functional, we recognize that our system is a prototype. We suggest a future route that
would ensure a reliable, scalable, and functional system. This report documents not only our final design,
but our decision frameworks and survey of technical options along the way.
1
Contents
1 Introduction ............................................................................................................................................................................................ 2
1.1 Problem Setting ............................................................................................................................................................................. 2
1.2 Heating and Cooling Theory .......................................................................................................................................................... 2
1.3 Visionary Scenario ......................................................................................................................................................................... 3
1.4 Proposed Solution and Project Definition ..................................................................................................................................... 4
2 Technology Survey & Decision Framework ............................................................................................................................................. 4
2.1 Sensor Networks ........................................................................................................................................................................... 4
2.2 Sensors for Temperature and Humidity ........................................................................................................................................ 7
2.3 Decision Framework and Final Selection ....................................................................................................................................... 8
3 System Design ....................................................................................................................................................................................... 10
3.1 Sensor Node Design .................................................................................................................................................................... 11
3.1.1 Hardware ........................................................................................................................................................................... 11
3.1.2 Power Source Design ......................................................................................................................................................... 13
3.1.3 Software ............................................................................................................................................................................ 15
3.2 Data-logger Design ...................................................................................................................................................................... 16
3.2.1 Machine Selection ............................................................................................................................................................. 16
3.2.2 Software ............................................................................................................................................................................ 17
3.3 Design Limitations and Potential Improvements ........................................................................................................................ 18
4 Data Collection Results & Graphical Exploration .................................................................................................................................. 19
5 Modeling for Prediction and Inference ................................................................................................................................................. 24
5.1 Training Data ............................................................................................................................................................................... 24
5.2 Prediction Models ....................................................................................................................................................................... 25
5.2.1 Artificial Neural Network ................................................................................................................................................... 26
5.2.2 Linear Regression ............................................................................................................................................................... 30
5.3 Causal Inference .......................................................................................................................................................................... 34
6 Interactive Data Analysis and Visualization Platform ............................................................................................................................ 35
7 Conclusion ............................................................................................................................................................................................. 38
8 References ............................................................................................................................................................................................ 38
9 Appendix ................................................................................................................................................................................................. 0
9.1 Sensor Node Software: Photon C++ Script .................................................................................................................................... 0
9.2 Raspberry Pi: Logger Python Script ............................................................................................................................................... 1
9.3 Raspberry Pi Dropbox Backup Python Script ................................................................................................................................. 4
9.4 Group Account Information for Gmail, Dropbox, Particle Build .................................................................................................... 4
9.5 Ipython Notebook for Plotting Data .............................................................................................................................................. 4
9.6 Data Analysis Ipython Notebook ................................................................................................................................................... 4
2
1 Introduction
1.1 Problem Setting Hunt library has a heating and cooling system that fails to keep occupants comfortable. An attendant at the arts
information and reference desk on the fourth floor claims it’s too cold in the winter and too hot in the summer.
She keeps a space heater under her desk to mitigate the problem. The special collections and design librarian on
that same floor, who has been working in the building for over 30 years, says the problem stems from the entire
envelope of the building being made from aluminum and glass. The fourth floor has the most issues, she claims.
She mentioned hot and cold air pockets and issues unique to each side of the building. Her colleagues on the
west side of the building, she says, keep three blankets in their office as well as space heaters. They dress in
layers, and keep fans ready for the summer time. The computer services manager used to work on the second
floor. He was offered a larger office on the fourth floor. He knew there were HVAC issues, but moved there
anyway. Ever since, it’s been a battle. Sometimes he opens his windows. He says it’s “unstable”. Sometimes it’s
75 degrees outside but freezing inside. His thermostat is set to 80 degrees, hoping to gain another degree or two
from the current 68. The architecture librarian on the fourth floor claims that the air in the winter is dry. He says
some people bring in humidifiers. Coming out of the more controlled special collections / rare books room you
can feel the humidity change. On the third floor, the head of interlibrary loan mentioned the heating and cooling
doesn’t respond to what’s happening outside. A warm week in the winter means it’s going to be really hot
inside. In the lowest floor (basement), at a service desk, the attendant says it’s always too cold in the winter. He
keeps a large space heater on the floor to warm the room.
Figure 1: Space heater in the service desk on the bottom floor.
The problem of heating and cooling in Hunt library has been well established and is well known by various
faculty and staff around the Carnegie Mellon Campus. Some blame the building construction. Others blame the
HVAC system design, or its controls. If you walk through the library and look at the variety of different
thermostat settings, you might blame that. Either way, there is a point of pain - both for occupant comfort as
well as budgetary.
1.2 Heating and Cooling Theory The awry indoor conditions in Hunt Library are, like for any building, a function of several factors – both internal
and external. The essence of the physical phenomena dictating internal temperature and humidity is an energy
balance. The most important factors include outdoor temperature and humidity. Solar irradiation during the day
can also have an impact as the envelope of the building heats up and conducts into the interior. Outdoor wind
speed can induce convective heat transfer. Internal factors, such as heat generation from people, computer, and
3
lights, also have an impact. Finally, managing all of this are the mechanical systems. They add or remove heat
from the system, while also performing important functions such as providing fresh, clean air. The figure below
illustrates these interactions.
Figure 2: Thermal balance of a building.
Most air conditioned buildings are kept somewhere around 70 F. This can be slightly lower in the winter or
higher in the summer, but large deviations from this figure will result in occupant discomfort. If large variations
in temperature are present in a building – e.g. if one floor has in an open area a wide range of temperatures as a
result of poor HVAC performance – inefficiency results.
This basic theory of heating and cooling should be kept in mind in the rest of the sections. Data analysis
considered the factors discussed here carefully in designing a framework. Results for temperature and humidity
measurements should be considered in the context of this theory.
1.3 Visionary Scenario Having introduced the problem, we present our vision for two stakeholders involved.
The Facilities Manager
The facilities manager comes into his office and opens up his computer. He has access to data that would help
him make decisions to mitigate energy loss stemming from inefficiencies in Hunt. The data shows him a color
map of current temperature, humidity, and occupancy for each floor of the library. He is interested in looking at
average values over two weeks, so views that. Then he is interested in values as they vary over a day, and views
that. He feels empowered with this information. He can utilize this combined with HVAC operation and energy
consumption data to make informed decisions for optimization and maintenance.
The Office Worker
The problem: Jill walks into her office in the basement of Hunt library. She takes her coat off and leaves it on the
back of her chair. She’ll need it in a few minutes when she gets cold. She turns on her space heater which helps
her stay comfortable. She has a meeting before lunch. She heads up to the fourth floor and sits down around
4
the conference table. The meeting starts, but all of a sudden she’s feeling very warm. Indeed, she forgot she was
wearing 2 sweatshirts - she usually sheds them before heading upstairs. One day, she hears from another co-
worker that a new data driven approach has enabled progress on the heating and cooling problem in the library.
A few weeks later after noticing marked improvement in indoor conditions at work, she scraps the space heater,
eases through her morning routine without worrying about packing along extra sweaters, and finds herself more
focused and productive throughout the day.
1.4 Proposed Solution and Project Definition With the problem defined and the vision for resolution, we are ready to propose the components of our project
that will comprise an integrated solution.
The first goal of this project is to quantitatively assess indoor conditions – namely, temperature and humidity.
The spatial (variation by location in the building) and temporal (variation over time) profiles for these variables
are to be determined. Data exists for certain zone temperatures from the IBM Smart Campus project, but the
majority of spaces in the building do not have space temperature or humidity data (aside from mechanical
thermostats). Thus, a sensor network which measures temperature and humidity is to be developed. The
network will allow data collection at a fine resolution (6+ measurements per floor) so as to capture the profiles
we seek.
The second goal of this project is to identify important factors driving indoor conditions and predict those
conditions. The IBM Smart Campus project can provide data on the mechanical systems (supply and return air
temperatures, valve positions, flow rates, etc.). Outdoor measurements of temperature, humidity, solar
irradiation, and wind speed are also available. This information can be incorporated into a data analysis
framework to achieve our goal.
The third goal of this project is to visualize the results. We envision an application that allows a user to observe
in real time the indoor conditions through intelligent graphics, look up and download historical data, receive the
outputs from data analysis, and predict future conditions. This application will allow the building manager to
make data driven decisions to correct problems related to indoor conditions as well as provide building
occupants advanced knowledge of indoor conditions so they can dress or prepare accordingly.
2 Technology Survey & Decision Framework A variety of different technologies are available for implementing a sensor network to measure temperature and
humidity. A comprehensive review of these is included in this section. We consider technical performance, cost,
feasibility, and learning objectives in making a final selection to use for our system.
2.1 Sensor Networks Sensor networks allow for monitoring of physical conditions (such as temperature, humidity, or sound level) in
different locations and to pass the resulting data to a central point.
There are a variety of different types of sensor networks that are commonly used. Some include Wifi, Bluetooth,
and Radio Frequency communication methods. The main differences among these technologies include data
transmission rate, power consumption, required components, and cost. In this section we explore some of the
different technologies available. Solutions range from complete integrated products to “do-it-yourself”.
Wifi Based Sensor Networks
5
With wireless internet available almost anywhere this day and age, the concept of a sensing device that
connects to the wireless network is attractive. Such a solution framework is elegant because the ‘receiver’ in this
sensor network already exists - wireless routers. All a device has to do is send data through a wifi connection,
and this data can be stored in the cloud and accessed as needed through a browser or otherwise obtained
through download.
An integrated product that exists on the market is Wireless Sensor Tags. These consist of small, thin squares
(tags) which are sensors for temperature, humidity, motion, and more. The ideal tag for temperature and
humidity measurement is a 13 bit tag with +/- 2 % relative humidity and +/- 0.4 C temperature accuracy at a cost
of $29 each. An ethernet tag manager is also required and has a cost of $49. There does not appear to be bulk
pricing [1].
A more hands-on Wifi solution would be the Photon. This piece of hardware is designed to be an Internet of
Things enabler. Basically, it serves as a microcontroller with supporting components to be directly wired into a
sensor circuit, and is equipped with a Wifi chip to broadcast data. The Broadcom Wifi chip - the Po Wifi module -
is found in the Nest, LIFX, and Amazon Dash. The microcontroller is an STM32 ARM Cortex M3. It’s
programmable using Wiring which is also used by the Arduino. Code can be written on the web or in a local IDE
and sent wirelessly to the Photon - updates are possible remotely. It’s open source software. The power
requirements are 3.6-5.5 VDC or a USB power source [2]. Advantages to this type of network to the one above is
that an ethernet module is not required - the chips simply connect to the internet and put your data in the cloud
(a server which you get access to by Particle IO). The photon costs $19 per [3] and does not change in bulk.
While the Photon would be appropriate for prototyping with a network on the order of 10 nodes, larger scale
networks might be better constructed using the TI CC3200 SimpleLink - a Wi-Fi enabled MCU (microcontroller).
An integrated development circuit board including various types of sensors (a “plug it in and go” solution) - the
LaunchPad - allows for prototyping but is three times as expensive as just the MCU [4]. However,
implementation on the order of tens or hundreds of nodes would require integrating the SimpleLink into a
custom circuit board with the appropriate sensors and components. More specifically, the SimpleLInk has an
ARM Cortex-M4 microcontroller. The wifi chip is compatible with WPA2 Personal and Enterprise Security. The
power supply from batteries should be in the 2.1 to 3.6 V range. It has low-power modes such as hibernate and
low-power deep sleep. There are guides for application and programming, which can be done through the
Uniflash Standalone Flash Tool for TI Microcontrollers. One major advantage for large scale implementation
would be the fact that cost decreases with quantity ordered: 1-9 for $16.8, 10-24 for $14.78, 25-99 for $13.58,
and 100 for $12.62 [5] [6].
Radio Frequency Based Sensor Networks
ZigBee is a standards-based technology that is usually less expensive than Wifi and Bluetooth technology. It can work on a low duty cycle, allowing for sleep and rapid wake up which enables long battery life. A normal small coin-cell battery could provide one node with multi-year operation [7]. Compared to Bluetooth and Wifi, it has a relatively low information transmission rate (20~250 kbps). Additionally, ZigBee has a mesh network mechanism whereby nodes within the system are interconnected and self-recoverable so that if a node fails, other nodes can reconfigure for alternative routing paths for the new network structure [8]. One of the Zigbee integrated solutions is the TI CC2538 System-On-Chip (SoC). It has high processing power with the ARM Cortex-M3 application processor and sensor controller engine (SCE). It has support for Zigbee and IEEE802.15.4 mesh networking, which enables a 10-meter communications range and a transfer rate of 250 kbit/s for each node [9]. According to the power calculator provided by TI official website, a product designed with the Zigbee CC2538 (as the Zigbee end
6
device) operating at 3.0 volts with a typical 230 mAh coin cell could work nearly two years with 144 data transmissions per day (ie. one transmission per 10 minutes) [10]. The unit price for this SoC is $7.65 [11]. Note that this SoC is a chip that is to be integrated into a circuit – it’s meant for large scale application. Prototype level zigbee modules with leads that could be interfaced with a breadboard circuit and microcontroller are also available at a cost of $19 [12]. Another option for setting up an RF sensor network is to employ Arduinos with RF transmitters as sensor nodes and an Arduino/Raspberry Pi as a receiver hub as done by Ben Miller in an electronics article called “Building a Wireless Sensor Network in Your Home” [13]. Basically, the sensor nodes consist each of sensors, an arduino for signal processing, and an RF transmitter and antenna. The receiver consists of an antenna, receiver, arduino, logic convert, and Raspberry-Pi. Arduino is an open-source microcontroller widely used for prototype electronic projects. Arduino boards allow for reading digital or analogue signals such as those from a temperature sensor and producing outputs. One can send instructions (via the Arduino programming script which can be developed in the Arduino development environment (IDE)) to the microcontroller unit (MCU) on board for processing of inputs and production of desired outputs. The most economical option for Arduino would be Uno. It’s based on the ATmega328p MCU, which has 14 digital I/O pins, 6 analog inputs, and one USB port. This board could be powered via a USB connection or typical external power from an AC-to-DC adapter [14]. One of the most important advantages for Arduino board is that it could be conveniently programmed to achieve all kinds of functions incorporated with sensors, transmitters, receivers, and microcomputers. A Raspberry pi is actually a credit card-sized single board computer that is widely used for dealing with all kinds of processing work. After deriving the data from the Arduino board, a Raspberry Pi could compile the data into our expected format, and provide access for cloud storage or direct afterwards analyzing. One Arduino board costs $24.95 [15]. The total cost for this network, however, would include transmitters, receivers, antenna, sensor components, and a Raspberry Pi ($35.5) [16]. It’s estimated that the cost of 1 transmitter and 1 receiver could be over $100 [13].
Table 1: Comparison of sensor network technologies.
Sensor networks
Communication Technology
Cost Information Advantages Disadvantages
Wireless Sensor
Tags Wifi
$49 (receiver)+$29 per tag
Fully implemented sensor network solution.
No development needed (not valid
for a class)
Photon Wifi $19 per tag
Reprogrammable, digital/analogue, low
cost, feasible for prototyping.
Poor economies of scale.
TI chip Wifi $17-$12, depending
on quantity
Very low cost, ideal for large scale, low energy
consumption
Requires integrating into
circuit
Zigbee RF
Prototype Level – $19/ transceiver (no
MCU) Integrated SoC for
large scale $7.65/unit
Very low energy consumption, long range
communication.
Low information transmission rate
Arduino RF Over $100/unit Reprogrammable High cost
7
2.2 Sensors for Temperature and Humidity This section introduces options for sensors that measure temperature and humidity. It presents the physical
phenomena / mechanism of sensing, the specifications (accuracy, precision, etc.), and the cost for a variety of
different alternatives.
Temperature Sensors:
Temperature sensors operate on two primary principles: voltage due to a difference in temperature, and change in resistance from change in temperature [17].
Thermocouples are made from two different metals which join at a point. They produce a voltage due to the temperature difference between the two ends [18]. Thermocouples are the most commonly used temperature sensors, and are used where cost, simplicity and wide operating temperature range are vital, but high accuracy is not. Further, thermocouples are also susceptible to noisy data reading given the low voltage involved, and fare worse in terms of linearity and accuracy than the RTD. The price range of commercially available thermocouples on the internet is $4-$12. An example of a popular thermocouple sensor is the uxcell K Type -50-700C Thermocouple Probe Temperature Sensor [19].
Semiconductor sensors incorporate solid state devices like diodes or voltage references, and work on the same operating principle as thermocouple sensors, i.e, they produce a voltage proportional to the difference in the temperature between the transistors, both of which operate at different stable collector current densities.
Semiconductors are popular for use in embedded systems, given their delicate nature [20]. Types of semiconductor sensors include voltage output, digital output, resistance output, and diode. Semiconductor thermal sensors have accuracies similar to thermocouples, unless they are calibrated, in which case they exhibit improved accuracy. (+/- 0.75 degrees C to +/- 4 degrees C) They are also very small, and are a low cost alternative. The major drawbacks when considering semiconductors for use in the project is their delicate nature, and poor interchangeability.
Resistance Temperature Detectors (RTD) and thermistors work on the same operating principle. A change in temperature causes metals to expand/contract, thus causing a change in resistance which can be mapped to a change in temperature. RTD classes A and B, consist of 100 Ohm Platinum wire wound around a cylinder. The data logger inputs a known current, and measures voltage to determine resistance. Then, the resistance is mapped against the temperature resistance curve for the material used, to get the temperature. RTD sensors can also use copper or nickel. RTDs are more accurate than thermocouples, and are extremely stable over long periods of time. (Common RTD drifts are around 0.1 degrees centigrade/year) They also come with high standards for repeatability, and have a very low response time. Like semiconductors, RTDs have limited temperature range by broader industrial standards. However, this range (Typically -70-260 Degrees Centigrade) is still encompasses the range required for the project. The price range of commercially available RTD sensors is $10-$34 [21] [22]. Examples include the AGPtek Stainless Steel PT100 RTD Thermistor Sensor Probe [23] and the Liquid tight RTD sensor, 34 mm probe, 1/8 NPT Thread [24].
Thermistors work on the same principle as RTDs, and have resistances proportional to the ambient temperature. They have a operating range that satisfies the requirement for use in this project (-70 degrees C to 150 degrees C), and have a very high accuracy. (+-0.2 degrees C) Thermistors also offer a resistance change that is very non linear, resulting in very high accuracy (0.01 degree C), but are still not accurate as RTDs. Typically, they are cheaper than RTDs. Examples of thermistors include the Caldera Spa Temperature Sensor Thermistor Ewgx272
- 71578 ($32.02) [25] and 10K Thermistor Temperature Sensor | Miniature Stainless Steel ($10) [26].
Humidity Sensors
8
Humidity sensors are predominantly divided across resistive, capacitive, and thermal conductivity sensing technologies.
Capacitive humidity sensors operate on the principle that their dielectric constant is proportional to the relative humidity of the air around the sensor. These sensors have low temperature coefficients, and can function in very high temperatures. They are also produced in high-efficiency semiconductor manufacturing plants, and give rise to minimal drift over time. Typically, capacitive sensors give errors of ±2% RH with calibration. These sensors are most commonly used for applications in the industrial, commercial, and weather telemetry spaces [27].
Resistive Humidity Sensors work on the principle that hygroscopic mediums have an inverse relationship
for impedance (Normally exponential) with the relative humidity. Resistive humidity sensors have a number of
advantages: They are interchangeable/field replaceable within ±2% RH. These sensors are also small, cost-
effective, and have very high levels of stability over long periods of time. A significant downside of using these
sensors, however, is their susceptibility to chemical vapors and oil mist. They also give lower accuracy readings
on exposure to condensation if they use a water soluble coating. This, however, should not be a problem within
the scope of our project, given that the setting is on the inside of Hunt Library. Long term stability, low cost, size,
and interchangeability make these sensors ideal for industrial, commercial and residential applications [27].
Thermal conductivity based sensors are durable, and extremely resistant to chemical vapors. This is due to the inert materials like glass/semiconductor material that these sensors are used to make these sensors, thus also giving them a very high operating temperature range. These sensors are used in situations where the operating temperature is in excess of 200 F, and have an accuracy of ±5% RH at 40°C and ±0.5% RH at 100°C. Given the operating temperature range, applications include use in kilns, in machinery for drying textiles, cooking, and
pharmaceutical production, and thus, would not be suitable for use in the project [27]. Table 2: Summary of sensor types and performance.
Sensor Category Sensor Type Advantages Disadvantages
Temperature
Thermocouples Low cost, easy to
operate Low accuracy
Semiconductors High accuracy Unstable, need repairment or
replacement frequently
Resistance Temperature
Detectors (RTD)
High accuracy, high repeatability, stale
Limit range
Humidity
Capacitive Humidity Sensors
High accuracy, stable, able to recover from
condensation Large size, high cost
Resistive Humidity Sensors
High accuracy, low cost, small size
unable to recover from condensation
Thermal Conductivity
Humidity Sensor
Stable, resistance to physical and chemical
contaminants Low accuracy, high cost
2.3 Decision Framework and Final Selection The first step in choosing a technology was to filter out those which do not meet technical requirements of the
system. For the sensor network, the technical requirements are:
9
Ability to interface with analogue and digital sensors
Ability to be powered by either battery or plug point
Communication range appropriate for a large academic building
Scalability from small (2-4) to medium-large number (10-20) of nodes
For the sensors, the technical requirements are:
Accuracy +/- 0.5 C, 5% RH
Range 0-100 C, 0-100 % RH
The optimal choice will meet these technical requirements, and also score highest when considering criteria of
cost, feasibility, and learning objectives.
From the previous sections, we see that five communication technologies are available, which are the Photon, Wireless Sensor tags, the TI chip, the Arduino, and the Zigbee. In terms of cost, a good comparison is that of 4 nodes, because we’ll just take our measurement on one floor and the demand is not so large. As is shown in the following table, 4 nodes of Wireless Sensor Tags cost the least, followed by Photon, while Arduinos cost most. Notably, The cost calculation is an estimated one, including the combination of technologies, sensors and power sources.
Table 3: Costing Out Sensor Network Technologies
Technology Cost for 4 nodes Cost for 10 nodes Cost for 100 nodes
Wifi – Photon [28] $179.20 $431.50 $4,216.00
Wifi - Wireless Sensor Tags [29] $165.00 $339.00 $2,949.00
Wifi - TI Chip [30] $192.65 $467.57 $4,609.36
RF – Arduino [31] $250.19 $489.89 $4,085.00
RF – Zigbee [32] $181.40 $453.50 $4,290.00
In terms of feasibility, we compare how well each product suit our project. The most suitable communication tech seems to be WiFi, as it is everywhere in Hunt Library and we could get access and use it easily. On the other hand, RF-based products, like the Zigbee and the Arduino, require an additional ethernet, which most increase our difficulty and financial budget. Another requirement of our project is the product should be programmable, as we will write our whole code to fit for the specific environment. In this term, WST ranks the last, because it does not have an open source. Compared to WES, the Photon is a good product, allowing the simplest way to program it. Moreover, we also need to consider how to power the network. We could either power it by USB or battery, but in our case, battery seems to be better, as there's not enough USB chargers in Hunt. In this term, RF-based techs, though consume less energy, is not our best choice, as they all require a DC-to-AC convertor. The last concern is whether the product has less restrictions on choosing sensors. A good network will be the one which has both digital and analog input/output, and with a wide range of power output. In terms of this, Photon outcompete all the other products
As is mentioned before, WST performance relatively well in cost calculation, but it is still not a good option in our case. That is because WST is a whole system product, ranking the last in learning objectives, and gives us the least opportunity to study communication mechanisms. On the other hand, the second best option, the Photon, is an ideal product. As a tiny Wi-Fi development kit for prototyping and scaling, Photon is reprogrammable and can be connected to the cloud, allowing more learning objectives.
To reach a final decision, each factor( cost, feasibility, learning objectives) is to be assigned a weight to enable a quantitative comparison across the options for sensor network technology. In the end, the product
10
with the highest score is our final choice. Among all the factor, learning objectives is less important, so the weight is 2, while the other two factors each weigh 3.
Figure 3: Comparison of sensor network technologies for final decision.
From the bar chart above, we could find that the Photon has the highest score, and suits our project best.
As for sensors, several factors are important - cost, power voltage, accuracy and prototype feasibility. Specific to our case, another problem is also worth considering - whether to buy one temperature sensor and one humidity sensor separately, or buy a combined T/H sensor instead.
In terms of cost, the combined T/H sensor has more advantages over two separate sensors. For example, a typical thermal sensor (RTD Platinum-Clad NI Wire) costs $13.99 [38], and a typical humidity sensor (Sunkee AMT1001) costs $8.96 [39], so the total cost is about $22.95. While on the other hand, a combined T/H sensor, like RTH03, only cost $9.95 [40].
Based on the voltage produced by the Photon, two sensors are suitable - RTH03 and HDC 1080[41]. However, HDC 1080 is only a tiny chip and needs more components before connecting to the circuit. On the other, RTH03, with a power range from 3.3 to 6V, and an accuracy of 0.5C(T)/2%(H), meets all of our requirements, and can be easily placed into circuit. It is a digital sensor and thus more easily interfaces with the Photon as well as allows for easier extraction of the temperature and humidity. It is also factory calibrated. Considering all of this, the RTH03 combined sensor becomes our final choice.
3 System Design As discussed in the previous section, the Particle Photon was selected as the wifi-enabled microcontroller unit to
be utilized in the sensor nodes. The choice of the Particle Photon impacts the flow of data after collection. A
convenient feature of the Particle Photon is that data can be sent to the Particle cloud – a server hosted by the
parent company Particle – is that data can be stored temporarily before being fetched by users. Having
introduced this, we can now present the system schematic which summarizes how information flows from the
sensors to the final steps of data analysis and visualization:
11
Figure 4: System diagram.
Temperature and humidity are measured in the sensor nodes. They send these measurements to the Particle
Cloud server. A data logger fetches the measurements from the server and saves them so they can be used in
data analysis and visualization. These latter two components enable the user to make data driven decisions.
The current implemented system contains six sensor nodes (the three shown in the figure are just for
demonstration purposes). The subsequent subsections will explain the hardware and software design of these
nodes. Later subsections discuss the data management system.
3.1 Sensor Node Design The sensor nodes are the physical units that are to be placed to take measurements. Ideally, they are mobile so
they can be placed and moved as needed. They should be able to take measurements reliably and pass the
resulting on to a central point.
3.1.1 Hardware The circuit design for the sensor nodes appears below:
12
Figure 5: Circuit diagram
The white piece is the digital temperature and humidity sensor, chosen for it’s reasonable cost (~$10) and high
accuracy (+/- 0.5 C for temperature , +/- 2% for relative humidity) [33]. It has power into row 3 (it’s first pin, red
wire). A pull up resistor (1000 K) runs from its second pin in row 4 to power. Finally, its fourth pin (row 6, black)
is grounded. A 100 nF capacitor (green) is added between power and ground for smoothing. Its second pin (row
4) is connected to digital input D5 on the photon (row 17, purple wire). Power is supplied from the 3V3 volt pin
(row 22, red). The photon is powered through via battery. The input range for Vin on the photon is 3.6 to 5.5
volts. Four AA batteries in series produce a voltage of 6 V, which is then dropped to 4 V through a linear
regulator – the MCP1702 (black part in the middle of the circuit board). This allows for voltage decline of the
batteries (this will be discussed further below).
The data sheet for the sensor specifies that the device is factory calibrated and ready to use. For the first two
sensor nodes, studies of accuracy and precision on readings for the temperature were conducted. A high
precision temperature sensor was set in the CEE classroom and allowed to come to a steady state reading for
one half of an hour. The temperature compared to the reading from the digital sensor within 0.5 C. The
precision of the sensors is also validated. When placed adjacent to one another, two sensors register readings
within 0.5 C and 2% relative humidity of each other. The relative humidity was not validated, as a probe could
not be accessed for this. It does, however, respond to clear changes in humidity (for example, if a human finger
touches the sensor). The other four sensors were assumed to be accurate.
The following is a Bill of Materials reflecting the components necessary for each node, to be scaled to the
number of nodes to be implemented. The parts are linked to a source for purchase.
13
Table 4: Bill of Materials, Per node
Part Quantity Unit Cost Cost
Particle Photon 1 $19 $19.00
1 kOhm resistor 1 $0.19 $0.19
1 nF capacitor 1 $1.10 $1.10
4xAA battery case 1 $1.60 $1.60
1 uF capacitor 2 $0.05 $0.10
LDO MCP1702 1 $0.48 $0.48
RHT03 sensor 1 $9.50 $9.50
Half breadboard 1 $2.67 $2.67
Breadboard wire ~10 $0.01 $0.10
Jumper cables to sensor 6 $0.07 $0.42
Total $35.16
The total cost of $35.16 is nearly $10 less than the estimated cost listed in section 2.3. It is also less than the
Wireless tags technology mentioned during the technology survey. The battery design which enables reliable
power at a small cost is major factor in driving down the price. The cost comparison of different powering
options will be discussed in the subsection below as well.
3.1.2 Power Source Design To power the photon, there were two options: USB power (wired) and battery power (wireless). USB power
involves a 5V, 1A “wall wart” charger that would need to have either a USB extension or power extension cord
to reach the location of the node. It’s clear that there are disadvantages to USB power – extension cords will
need to run from the plug point to the node, which may be obtrusive or even present safety hazards. USB would
have the following cost components: Wall wart: $3.25 [34] , USB Extension (10 ft): $5.80 [35] , Micro USB (3 ft):
$2.25 [36]. Thus - a total cost of: $11.30 per node. It could be higher if lengths of more than 10 ft for cables are
required. The cost to tape down cables in certain areas, or run them above would also be a component.
Batteries, on the other hand, would allow nodes to be placed in the desired location without consideration of
plug points or obtrusive wire runs. With this improvement in mobility and safety, the cost is the only unknown
factor. To determine cost, a battery design solution is required.
Let us first define requirements for a battery power design solution. The measured current consumption during
Wifi connection is about 90 mA, while it is nearly zero (160 uA) during sleep mode. Thus, the battery needs to
supply up to 100 mA of current, and meet the voltage input requirement for the photon (3.6 to 5.5 V). It should
be low cost, not consume much space, have a long life (2 weeks +), and provide positive and negative leads to be
plugged into a breadboard. While preexisting battery solutions such as the Photon Battery Shield paired with a
LIPO battery [37] exist, their cost exceeds that of the USB power solution ($13+). A low cost battery solution can
leverage AA alkaline batteries in series. Each AA battery provides 1.5 V, and 4 in series provides 6 V. A linear
regulator can be used to output a constant 4V. The regulator consumes about 2 uA of current (negligible) and
has a dropout voltage of about 0.6 V (the voltage change across the regulator itself) [38]. To be conservative,
let’s assume the photon should never receive less than 4 volts. With the 0.6 V dropout, the battery should never
output less than 4.6 V. Four, as opposed to three, batteries are in series since voltage drops over the lifetime of
the battery – three batteries will hit a voltage lower than 4.6V. The figure below shows voltage drop over the
lifetime of a 1.5 V AA alkaline battery. Note that the capacity in Ah (amp-hours) changes with discharge rate as
well as the voltage drop.
14
Figure 6: Voltage and capacity changes with discharge rate and usage. Image source: [39]
The method of Amp-Hours is used to find an estimate of the life of the batteries for our application. Thus, we
need to know the discharge rate and the capacity of the battery. The typical number of mAh in an Alkaline Long-
Life (Duracell Coppertop) AA battery is about 2100-2200 mAh, when discharged at 100 mA [40]. As the discharge
rate drops, the life of the battery increases, and the rate of voltage loss also decreases, enabling an even voltage
until very close to full discharge (see above figure). An energizer AA battery, discharged at a rate of 25 mA is
supposed to have about 3000 mAh [41]. Thus, depending on the battery brand, capacity can vary.
For this case, the discharge rate can be found by an average. The amount of time it takes to take a measurement
is about 10 seconds. There is a period of connecting to wifi, measurement, and then a five second buffer to
make sure enough time to send is given. Thus, the average discharge rate is:
10 𝑠
10 𝑚𝑖𝑛𝑥
1 𝑚𝑖𝑛
60 𝑠𝑥 90 𝑚𝐴ℎ = 1.5 𝑚𝐴ℎ
This is extremely low, so the maximum capacity of the battery should be achievable. However, for our
calculation, to be conservative let’s assume we are at the 100 mA discharge rate – the right figure in Figure 6.
For a 4.6 V minimum, each battery should have a 1.15 V minimum. From Figure 6, we can see that the capacity
used at 1.15 V is about 1.6 AH.
Then, we find the expected life as:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐿𝑖𝑓𝑒 (ℎ𝑟𝑠) =1600 𝑚𝐴ℎ
1.5𝑚𝐴ℎ= 1333.3 ℎ𝑟𝑠 = 44.4 𝑑𝑎𝑦𝑠
Note that this calculation is conservative in assumed battery capacity to minimum voltage – it’s a lower bound
for using AA batteries. This is a relatively long life for a prototype (about 1.5 months), and could be extended
further if C or D batteries (same voltage, higher capacity) were used, or if measurements were taken less often
(each half our or hour could be justified instead of each 10 mins). For example, an energizer D battery (also 1.5
V) discharged at 25 mA has a capacity of about 17000 mAh [41]. If measurements were taken once per hour, the
life could be increased by about 50 – a life of about 6 years could be possible.
The components required for this battery solution are shown in the circuit diagram, and include a battery holder
case ($1.40 [42]), 4 Alkaline AA batteries ($0.24-$0.50 per battery [43] -> $1.00-$2.00), and a linear regulator
($0.48 [44]). A total cost of about $3 - $4 per node is expected. The cost is three times lower than the USB
solution, thus making the argument for batteries compelling.
15
3.1.3 Software
The software component of the sensor nodes is focused on the programming set for the photon microcontroller.
The program must meet the following requirements: it must read digital data from the sensor, push this data to
the cloud through wifi, and it must enter a sleeping state to conserve battery power. It must keep a timer in the
sleep state and wake up every 10 minutes to measure and send data.
Before delving in more specifics, let us introduce some important definitions we will be using: we have
mentioned that the Particle Photon is a wifi-enabled microcontroller that can pass data to the server of the
parent company – the Particle Cloud. There are two methods for passing data – the first is using a Particle
Variable. The photon creates a variable, and can update the variable with data. The variable can be accessed
from the server. Variables are stored temporarily on the server and can be fetched to a computer or database. If
the Photon that creates the variable remains connected to the internet, the variable can be accessed at any time
on the cloud. However, if the Photon goes into sleep mode, the variable is lost on the cloud. An event is similar
to a variable, except that other photons or computers can subscribe to an event name and listen for events. In
this fashion, the photon can wake up, take a measurement, update the event by publishing data to it, and then
going back to sleep. Other devices listening for the event will hear the event and can receive its accompanying
data. Thus, our software leverages the event functionality. This will be important later when discussing the data
logging aspect of the system.
Sources used in developing the program for the Photon include the RHT03 library example [45] as well as
documentation of a similar example which used the Spark Fun Inventor’s Kit to make an Environment Monitor
[46]. The latter used the same sensor as we are using and thus helped familiarize us with use of the RHT03
library. Pseudo code for the program is as below:
wake up
take T,H measurement
while failed measurement:
brief delay
try again
publish to event: node#,T,H
sleep for 10 mins
The error handling occurs where we measure until the measurement is successful. If this is not present, failed measurements are present in the data at a higher frequency. The full code (C++) is included in the Appendices.
Note that the above design is that for the final design which involved 6 nodes. Before arriving at this, a two node
system that ran from USB power made use of particle variables to publish data instead of events, and didn’t
require sleep mode. Then, the same two node system was powered by battery and events were used. However,
we due to a development lag on the data logger side of things, we still needed to fetch data from particle
variables. Thus, we used a ‘mother’ Photon that listened to events from the nodes and saved their resulting data
to variables on the cloud. The details for these previous designs can be found in the Phase 2 report.
16
3.2 Data-logger Design The data logger must retrieve data sent from the sensor nodes to the cloud and ensure it is saved. Our final
design uses a Raspberry Pi to do this. The selection of this component and the software used for it’s functioning
will be described in this section.
Figure 7: Data - logger conceptual diagram.
3.2.1 Machine Selection A data-logger must meet the following criteria: it must be always running, it must be able to connect to the
internet and listen for the events, receive their data, and store that data or pass it on to be stored safely.
Our team experimented with a variety of different tools to serve as a data logger, including Google Sheets (with
Google Apps Scripting), the Carnegie Mellon Linux servers, a Linux computer offered by our professor, and a
Raspberry Pi (RPi). Google Sheets experienced issues with allowed computing resources – an error with
description “Service using too much computer time for one day” kept recurring. We found the RPi to be
particularly well suited for this application – it is a Linux machine from which we can run python scripts to do the
data collection. It has 8 GB of storage (depending on SD card) and can run on lower power (5W). It can be
accessed remotely via SSH for debugging or script changing. Also important – since we own the device and are
the root user with ‘sudo’ permissions - we can install programs and packages as needed. For example, we
installed pip (python package installer) which made installing the various python packages we used much easier.
On the Linux servers and Linux computer mentioned before, installing python packages became an issue due to
privileges and the complications of local installs.
The Raspberry Pi can either store or pass data. Saving the data can be as simple as writing rows to a CSV file –
this is the approach we used. Another option is to push the data to a Database (SQL). This option was explored,
but was later abandoned due to technical difficulties and a general lack of need in the prototype system. To save
the data, we had to ensure there was enough space. With 8 GB of storage space (which just depends on the size
of the SD card), it had plenty of room compared to the estimated 0.6 MB our data would require for two seeks
of collection:
Data Storage Calculation:
17
Assume there are 20 sensors (high estimate). Each send will include a variable name and a piece of data
as a string. Variable names can be as small as a number for each sensor node and a letter indicating
temperature (t) or humidity (h): e.g. r1,t1,r2,t2, etc. This will be of size 2 𝑏𝑦𝑡𝑒𝑠. The sent data will be a
string of the temperature / humidity value: five characters (tens, ones, decimal point, two decimal
places) e.g. t1=50.01 F. Each character is 1 𝑏𝑦𝑡𝑒, so a total of 5 𝑏𝑦𝑡𝑒𝑠. So, each send for each node will
be 2 𝑥 (2 + 5 𝑏𝑦𝑡𝑒𝑠) = 14 𝑏𝑦𝑡𝑒𝑠. With 20 sensors, this will be 20𝑥14 𝑏𝑦𝑡𝑒𝑠 =
280 𝑏𝑦𝑡𝑒𝑠 𝑝𝑒𝑟 𝑠𝑒𝑛𝑑. Assuming we send every 10 minutes, the data size for two weeks of monitoring
would be:
280 𝑏𝑦𝑡𝑒𝑠
𝑠𝑒𝑛𝑑𝑥 (
24 ℎ𝑟𝑠 𝑥 60 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
ℎ𝑟10 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
𝑠𝑒𝑛𝑑𝑠
𝑑𝑎𝑦) 𝑥14 𝑑𝑎𝑦𝑠 = 564,480 𝐵𝑦𝑡𝑒𝑠
= 564.48 𝑘𝐵 = 0.56448 𝑀𝐵
Finally, the data logged onto the Pi could be accessed at any time via remote SSH file transfer, but a more
convenient method (and a safe practice) would be to back up the data to a Google Drive or Dropbox Account.
This functionality is implemented – details are to follow.
3.2.2 Software Now we will discuss the specifics of the python script that does the listening and logging. First – let us introduce
server side events (SSE). The Particle Event that we publish data to can be listened for using SSE. In SSE, a stream
is set up through which events flow. This can be easily implemented in Python. The actual data that arrives in
the stream with an event is a Unicode string of characters that contains in it the sensor node, variable names,
and data: e.g.
event: 12745_data
data: {"data":"temp2:81.14humd2:37.10","ttl":"600","published_at":"2016-05-
02T20:52:04.846Z","coreid":"36003a000b47343432313031"}
The python script must parse out the information from this. Note that there are six nodes, each sending in
events at different times but every ten minutes. Some drift occurs in timing since error handling on the node
side can pass time. Data analysis simply interpolates values at the desired time – so if the resolution of
measurements is small (10 minutes), actual timing of data doesn’t matter as long as it comes in roughly every 10
minutes.
During design iterations, we tried a few different methods to listen for and save this data. We first tried saving
the data to variables and each time all six came in, we’d save the variables values as a row to a csv file. This fails
if one of the nodes doesn’t come in, however. We then tried setting a timer such that every ten minutes, we
save all variables to the file, and if a variable wasn’t updated, we write ‘fail’. In both of these methods, however,
if the script is saving data while an event comes in, it may miss the event. Thus, it’s better to separate each
node. Since each node will only publish each 10 minutes, if you save after it comes in you have plenty of time.
Moreover, separating the collection makes it easier to handle errors – you don’t have to worry about checking if
all variables were updated – you just save the data when it comes. However, having 6 scripts running is
inefficient and non-elegant. A better solution is to separate the listening and saving behavior. Thus – we need
multithreading.
18
In multithreading, threads or processes run, sharing the same memory space. A queue can be constructed to
pass data between the threads. Such is how our solution works. We have a listener thread and a logger thread.
The listener listens for events on the event stream, and if an event comes in, it saves the relevant data to a
queue. A logger thread then pulls from the queue and saves the data to a CSV file. The logger checks the queue
every 30 seconds.
Figure 8: Illustration of multi-threading used in the data logging script with blocks and pseudo code.
It is also important to point out that after some amount of time functioning (2-3 days) the listening thread could
freeze. This is likely due to a broken event stream, although we are not sure. The solution to this is to having
another thread that times how long it’s been since an event was logged. If a few hours pass, and nothing has
come in, it creates another instance of the listening thread. In python, you can’t just kill off the older, frozen
thread. Thus, a nightly restart of the script could do that – is not expected that this occurs more than once per
day.
The python script is found in the appendix.
The above information is specific to the latest, 6 node system. However, the first data logging design was for a
two node system that made us of Particle Variables. In that case, server side events were not necessary. Every
10 minutes, the logging script would get the variables values from a url address and save them. The details for
this design can be found in the Phase 2 report.
3.3 Design Limitations and Potential Improvements Photon Programming
Around finals time, it was discovered that two of the photons were not reporting measurements at one point.
We investigated and found that in their particular location and elevation, there was a wifi ‘dead zone’. One node
had run out of batteries after having continuously attempted to connect, while the other was at such a low
19
voltage the Wifi functionality was disabled. We replaced the batteries and placed the nodes back in their
respective locations, but the problem persisted – measurements were not coming in. After removing the nodes
from the top of the bookshelves and bringing them to standing level they connected and resumed measuring
data. Thus, we concluded that the issue was a wifi dead zone. It is likely that increased demand during finals
time stressed the wifi network, since the nodes had been successfully measuring data for 7 days before this
occurred.
The solution to this issue would be to start a timer in the code before connecting to the wifi. If time to connect
exceeds say 2 minutes, the photon could shut down. The measurements could still be taken and simply stored
on board the microcontroller. The next time the photon is able to connect to the internet, it would send the
logged data. Moreover, this suggests an approach that could save battery life. Wifi could be only activated every
n measurements – where n is to be chosen such that the desire for real time data is satisfied to some degree,
and the photon is able to store the data before transmitting it. Since connecting to wifi uses more power,
battery life would increase if, for example, n=10 and only once in ten times was the wifi activated and the
measurements sent.
Data Logging
It would be ideal to kill off frozen event stream listener threads instead of having to do a nightly restart of the
script. A shell script that could find the process ID for frozen listener threads and kill them off would work for
this.
Also, the listener / logger thread could be improved by pushing more of the data parsing to the logger thread.
Currently, the listener thread checks to make sure the event isn’t blank. Then it checks to make sure the event
has the relevant data (temp and humidity). Instead of then having the listener thread parse out the temperature
and humidity from the data string, the logger could do this (in addition to its prior function of saving the data to
a CSV file). While there haven’t been any issues related to missing data due to parsing out the data (it occurs in
fractions of a second), this would be a safe move.
Future Direction
The design outlined here works well for this prototype system. The Photon, Raspberry Pi, breadboards, and
other hardware and software components of the system are ideal for a prototype. However, a more mature
design would likely require a more concise, inexpensive, and robust hardware design and a more robust
software and data handling design. A major dependence of the prototype is on a preexisting wifi network to
connect to. A more mature design might leverage radio frequency transmission to extend range and reduce such
a dependence. Among the technologies from the survey, a Zigbee network should be explored considering the
low power requirements, long battery life, and long range. If it is still desired to use Wifi but to improve node
design and scale the system, the TI wifi MCU mentioned in that technology survey section would be ideal, since
cost deceases with quantity ordered.
4 Data Collection Results & Graphical Exploration Data collection was primarily used as a means to test the reliability of the system and get a preliminary map of
the spatial and temporal profiles of temperature and humidity. We focused on the fourth floor of the library,
where, according to user interviews, temperatures are problematic. Placement of the nodes was on top of
bookshelves, which potentially exposed them to the top of a thermal gradient resulting in higher temperatures.
Nearby air supply vents could have had an impact on readings. We elaborate on how more accurate space
20
conditions can be measured later in this section. However, for the scope and timeframe of this course, we
judged our placement of nodes to be satisfactory considering our objectives. We needed an unobtrusive
implementation that would be exposed to potential tampering.
The approximate location of sensors for the six node system is shown below.
Figure 9: Placement of sensor nodes.
First we collected data with the two node prototype system at locations 1 and 4 in the above diagram for nine
days. Below the resulting measurements are plotted:
Figure 10: Data Collection for the 2 node system – temperature.
21
Figure 11: Data collection for 2 node system - humidity.
Data was collected for 11 days for the 6 node system. Below the results are presented. Note there is a gap of
about 1 day between 4/26 and 4/27 – this was when the data logger script was redesigned to include threading
as described in the previous section. Also note that nodes 4 and 5 stop at around 4/30 since there was a wifi
dead zone in the library at their locations.
Figure 12: Six node system data collection - temperature.
22
Figure 13: Six node system data collection - humidity.
There are several observations that can be made from these data collection results. We will discuss temperature
first. For the two node system, note that the first several days show relatively steady values of temperature until
the last two days when temperature increases. A diurnal pattern emerges where temperature increases (2-3 F)
and peaks at noon and then decreases. For the six node system, this pattern continues. The average
temperature for the nodes varies substantially – up to a 5 F temperature range.
One might expect nodes that are near each other in location to have closer readings than nodes that are further
away. Moreover, one might expect nodes connected by open air as opposed to physical boundaries to have
similar measurements. Examining Figure 7 it can be seen the Fine and Rare book room sits between nodes 2 and
5 – higher temperature differences might be expected between these nodes as opposed to nodes 1 and 4, and 3
and 6 which have open air between them. Indeed, examining Figures 8-11, note that nodes 5 and 6 on the
Northeast side of the library have similar temperature results. Nodes (1 and 4) and (3 and 6) are closer in
magnitude than nodes 2 and 5. Nodes 2 and 5 show the largest difference in average temperature - 5F.
The identified diurnal pattern of indoor temperature could align with outdoor temperature changes. While the
expectation might be that indoor temperature remains steady despite outdoor conditions, we show below that
this is not the case – indeed, outdoor temperate peaks correspond to indoor temperature peaks.
23
Figure 14: Indoor and Outdoor air temperatures. Outdoor temperature data provided by the Center for Building Performance and Diagnostics at Carnegie Mellon University.
Note that at least three of the six nodes the temperature exceeded recommended indoor levels which range in
the 68-76 F range for winter and 73-79 F range for summer [47]. However, as mentioned before, the measured
temperatures could be higher than space temperatures due to elevation.
Now examining humidity, note that for the two node system the humidity is relatively close in magnitude. There
are some random increases and decreases resulting in peaks among the baseline value of about 20%. At one
point the relative humidity goes up to 40%, and for a large portion of the data the relative humidity is between
15 and 20%. For the six node system, nodes 1 and 2 show similar RH range as in the 2 node system (same
locations). The other four nodes have higher values – mostly in the 30-50% range. All six nodes increase and
decrease in the same way, maintaining an offset from each other. Recommended indoor humidity levels range
from 30-65% [47]. If the measured values are indicative of the humidity in the occupied spaces, then the spaces
are on average too dry. This may be desired in a library where books must be kept in careful condition, but the
space is heavily occupied by students studying as well. The indoor humidity may be affected by outdoor
humidity. Let’s investigate visually:
24
Figure 15: Outdoor and indoor relative humidity measurements. Outdoor humidity data provided by the Center for Building Performance and Diagnostics at Carnegie Mellon University.
Here the relationship is less clear – some peaks and valleys in outdoor humidity map to indoor humidity.
It is suggested as future work to place the nodes at locations where space temperatures can be more accurately
measured. Certain issues such as exposure to air from supply vents, different velocities could still be an issue,
but placement at approximately 4-6 ft should reflect a more accurate general temperature. It is suggested to
encase the nodes such that they are tamper proof, and place them in spaces where students are working.
5 Modeling for Prediction and Inference Having visualized the data and some of the relationships between indoor and outdoor temperature and
humidity, we now move on to a more quantitative data analysis component of the project. The objective of this
effort is twofold:
Determine a ranked order of causal factors that influence the internal temperature and humidity inside
the Hunt Library. This information would help inform any corrective action to mitigate the effect each of
these factors have on internal temperature and humidity.
We seek to build predictive algorithms that can be used to make daily predictions on internal
temperature and humidity inside Hunt Library. For individuals working at Hunt, this would help address
the unpredictability of internal conditions on a day-to-day basis, and make for a more pleasant
experience.
5.1 Training Data
The training data of the artificial neural network consists of two primary categories of data-points. Data
corresponding to the input layer, and data corresponding to the output layer. The output layer data is acquired
from the sensor network we developed, and consisted of indoor temperature and humidity data at 10-minute
intervals over a 20-day time span. For the corresponding input-layer data, we identified two subsets of parameters
25
that could influence the output layer data: Internal factors and external factors. The following parameters have
been considered potentially important input parameters to the algorithms:
External Factors
o Outdoor Temperature
o Outdoor Humidity
o Wind-Speed
o Solar Radiation
Internal Factors
o Return Air Temperature
o Hot Air Supply Temperature
o DA Temp (Fresh Air Temp)
o Mixing Damper
o HD and CD actuator position
Outdoor factors like external temperature, humidity, wind-speeds, and solar radiation were acquired from the
Centre for Building Performance and Diagnostics at Carnegie Mellon. The center collects this data at 5-minute
time intervals, and represents an ideal primary data source given that all data is collected in close proximity to
Hunt Library. The data on the internal factors, we believe, is crucial to making very accurate internal temperature
and humidity predictions given that they are a true representation of the HVAC system condition, which is likely
to be a stronger factor in accurate prediction than the external factors are. We coordinated with with Prof. Pine
Liu from Civil & Environmental Engineering, Denise McConnell from the Facilities Management Services, and
Jingkun Gao a Ph.D student at the Department of Civil & Environmental Engineering to better understand, and
seek access to the IBM Smart Campus Initiative data collected within Hunt. While we were able to get access to
the real time data, we did not have sufficient time to set up a mechanism to access all trending data over the
period of interest. Accessing this internal data and plugging it into the algorithm is, therefore, going to be part of
the future work on this project.
The objective of the data import section of the ipython notebook was to retrieve data from the Centre for Building
Performance and Diagnostics, as well as the sensor data and combine them into a single time aligned data-frame.
Once the data-frame was in place, we used approximately 90% of the input data (Data collected in the period
04/22/2016-05/01/2016) for training the algorithms, and the other 10% (Data collected in the period 05/01/2016-
05/04/2016) for testing.
5.2 Prediction Models
Our approach to this problem has been focused around preliminary identification of all parameters that might
have an influence on the overall temperature and humidity. Ideally, any automated multi-purpose algorithm used
for internal temperature and humidity prediction must be capable of accounting for, and studying the influence
of a number of causal factors and input parameters, and the degree to which they influence changes in internal
temperature and humidity in Hunt. In addition, they must at the very least show potential for accurate prediction
of internal temperature and humidity.
26
5.2.1 Artificial Neural Network
While there is little literature around internal temperature and humidity prediction within buildings, a number of
models including time-series models, Fourier series models, regression models, and Artificial Neural Network
(ANN) models have been employed for load prediction [48]. We choose to employ a forward-backward artificial
neural network algorithm for this study. The ANN algorithm, in essence, seeks to replicate the neuron transmission
network of the human brain. It has some distinct advantages: It can, over a long period of time, teach itself to
learn non-linear relationships between input parameters and outputs of the neural network, while invalidating
the need for developing any understanding of the statistical relationships between the input parameters and the
predicted variables. Further, ANNs have been shown to perform better for building load prediction applications
than other traditional learning approaches like regression [49].
5.2.1.1 Mechanics of an Artificial Neural Network and Training
ANNs, as mentioned above, seek to make predictions on final outputs basing on a linear combination of inputs
from the previous layer. The term “layer” is important to understand the network optimization approach that the
ANN uses. Initially, an ANN will take training data for parameters in the first layer (Also called the input layer), and
corresponding training data outputs for the output layer. Essentially, we seek to teach the ANN through a series
of input combinations for which the corresponding output is a certain discrete value. This approach to learning is
often necessary to help algorithms learn optimally, in order to make accurate predictions. A schematic
representation of an ANN is as follows [50]:
Figure 16: ANN graphic representation. Figure source: [50]
27
The value of the “hidden layers” in the diagram above is given by a non-linear transfer function, which computes
a weighted sum of all inputs from the previous layer, plus a bias term. On each iteration, a set of input parameter
values are “propagated forward” through an ANN. These inputs after being aggregated as a weighted
combination, undergo transformations at each node. The nature of this transformation depends on the nodal
design chosen by the algorithm designer. The sigmoidal unit is the transformation function we chose for each
node within the ANN. That is, it is the rule that governs the output value of a node given an input that is equal to
the linear combination of nodal values from the previous layer. Sigmoidal units are not zero centered, and are
prone to saturation for large input values. However, input value magnification to the extent where the sigmoid
unit would return a zero value (and kill off that particular branch of the network) are unlikely in ANNs with one
hidden layer. However, given a more convoluted neural network, other activation functions like the Tanh, RELU
[51], and Leaky RELU might be considered. At the end of the forward propagation, over the output layer of the
ANN, we compute the L2 Norm between predicted output variables and actual output variables. In an ANN, we
update weight vectors at each iteration with the objective of minimizing this l2 norm (Euclidian distance) over the
output layer.
In directional terms, the process of back-propagation is the exact opposite of forward-propagation. Subsequent
to forward propagation, we calculate the losses over the output layer. After this, we propagate the gradient
calculated over the loss backward through the network. This process utilizes the chain rule, and we end with the
expression for the gradient of the loss function with respect to each of the inputs. The 'learnprocess' function in
the iPython notebook lays out the underlying mathematics involved in the forward and backward propagation
mechanics in detail.
These values are recomputed repeatedly by iteratively varying weights (Using gradient descent optimization) to
minimize the squared error of the difference between the target function output, and the actual prediction as
given by the training data. We stop this iterative process when this error converges to a predetermined level.
There is little known science to the process of designing hidden layers in an ANN. Usually, it is a plug and play, and
the design that gives us the optimal prediction accuracy is normally chosen. As an experiment in choosing the
hidden layer design for the ANN, we computed ANN error over various hidden layer designs during testing, and
picked the hidden layer designs that resulted in least testing error. This design was then adopted, and training was
performed for that specific design (Say, 9 Nodes) to ensure the weights at each node were iterated to give the
optimum global minimum for loss over the output layer. The visualization (see following page) for the prediction
error observed over a number of hidden layer designs (In terms of No. of Nodes) for each one of temperature and
humidity helped us arrive on an optimal hidden layer design to implement the algorithm.
28
Temperature:
Figure 17: Error vs Number of Hidden Layer Nodes - Temperature.
.
Humidity:
Figure 18: Error vs Number of Hidden Layer Nodes - Humidity.
5.2.1.2 Results
Subsequent to training the ANNs (With optimized hidden layer design) individually for temperature and humidity,
we passed our test data sets through the ANN algorithms. The prediction results were as follows:
29
Temperature
Figure 19: Predicted vs Actual Temperature
Humidity
Figure 20: Predicted vs Actual Humidity
While the prediction for temperature seems slightly better than that for humidity, in general, the ANN does a poor
job of predicting the internal temperature and humidity. We think that the availability of the IBM Smart Campus
data would have made for a more expressive set of input parameters that could be, as mentioned previously,
more important predictors than the external parameters. The ANN algorithm has proven effectiveness in
multivariate predictive modeling, and with some more time invested in researching the algorithm, and building a
more expressive set of input predictor variables, we believe that the algorithm will show improved results.
30
5.2.2 Linear Regression
Having completed the ANN design and prediction, we now seek to implement a Linear Regression model primarily
for causal inference, but also as a learning experiment in predictive modeling. First, we seek to understand the
relationship between predictor variables (Inputs), and response variables. (Output) In order to do this, we first
visualize the relationship of the input variables with each one of internal temperature and humidity. A sample
visualization plotted between external temperature and internal humidity is as follows:
Figure 21: External Temperature vs Internal Humidity
The relationships between the predictor variables and Internal Temperature and Humidity (Hum_1/Temp_1),
therefore, seem to be based on a higher order multinomial relationship. (An exhaustive set of plots of these
correlations can be found on the iPython Notebook) We will conduct tests on polynomial regression models of
different orders to determine the one that gives us the best prediction.
5.2.2.1 Linear Regression Mechanics
The objective with linear regression, is to estimate Temperature and Humidity given predictor variables values.
The equation is as follows:
𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝑎0 + 𝑎1 (𝑆𝑜𝑙𝑎𝑟 𝑅𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛) + 𝑎2 (𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) + 𝑎3 (𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) + 𝑎4 (𝑊𝑖𝑛𝑑 𝑆𝑝𝑒𝑒𝑑)
𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 𝑎0 + 𝑎1 (𝑆𝑜𝑙𝑎𝑟 𝑅𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛) + 𝑎2 (𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) + 𝑎3 (𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) + 𝑎4 (𝑊𝑖𝑛𝑑 𝑆𝑝𝑒𝑒𝑑)
Multivariate regression offers a convenient closed form solution that would allow us to determine the coefficients
corresponding to each input variable:
31
𝐴 = [(𝑋𝑇𝑋)−1]𝑋𝑇𝑌
Where A is the vector of coefficients corresponding to each predictor variable. X is the matrix of predictor
variables, and Y is the corresponding vector of outputs observed in each training example.
We then experiment with first and second order linear regression approaches to prediction. The primary
difference between the two, is that the second order regression explodes the input variables to a higher
dimensional space. Intuitively speaking, this explosion of space allows for more dimensional degrees of freedom,
which in turn would allow the regression algorithm to better fit with the data. Again, we use the training data to
estimate parameters, after which we pass the testing data through the algorithm, which then allows us to visualize
testing accuracy.
5.2.2.2 Linear Regression Results
First Order:
Temperature
Figure 22: First Order Prediction vs Actual Internal Temperature
Humidity
32
Figure 23: First Order Prediction vs Actual Internal Humidity
Second Order
Temperature
Figure 24: Second Order Prediction vs Actual Internal Temperature
Humidity
33
Figure 25: Second Order Prediction vs Actual Internal Humidity
An immediate takeaway from these visualization is that linear regression (both orders) display a better fit than
the ANNs implemented in this project. The observed l2 loss over the actual and predicted values are calculated,
and linear regression displays lower L2 Norm. This indicates greater prediction accuracy.
What is now evident however, is the magnitude of the benefits that the second order regression offers over the
first order. To do this, we now lay out the framework for an ANOVA (Analysis of Variance) test, which, given
multiple possible predictive models could be used to judge the relative significance of a model when compared
to other models. (For example, a comparison between first and higher order regression, and if one is
significantly better than the other) In the iPython notebook, we put forth the method for carrying out an ANOVA
F Test for a first order regression model, but stop short of comparing multiple models using the F-Test given
time constraints. Future work could include a robust ANOVA F Test to compare between various predictive
models to determine one that best fits the population of the observed data. As mentioned previously, error can
be due to either one of two reasons:
The lack of expressiveness in a regression model
Predictor variables
These together comprise of what we will refer to as regression error. The other component of error, not
mentioned previously, is vague and difficult to isolate. Therefore, we attribute it to unexplained "error".
In the iPython notebook, we compute, and lay out the formulation for ANOVA parameters such as degrees of
freedom, sum of squared errors, and mean of squared errors. (Total, Error, and Regression parameters for each)
Given the mean squared errors for the explainable error (MSreg) and unexplainable error (MSerr), the ANOVA F
test seeks to calculate the F-Statistic, which is calculated by finding the ratio between MSreg and MSerr. Large
values for the F-Statistic indicate that the mean of squared explainable error (MSreg) is large. This in turn
indicates that the fitted values (Ypred) are far away from the overall mean (Mean of predicted values).
Therefore, we can conclude that the expected response would change significantly along with the magnitude of
of the regression plane being used for regression. We notice F-Statistic values of first order temperature and
34
humidity prediction: 60.48 and 112.39 respectively, which are reasonably large when compared to values in the
F-statistic table. Therefore, there is strong evidence for the implementation of higher order models.
5.3 Causal Inference
For causal inference, we computed the variance-covariance matrix which is calculated as follows:
𝐴 = [(𝑋𝑇𝑋)−1]𝑀𝑆𝑒𝑟𝑟
Where, X is the input data, and MSerr is the mean squared unexplained error, also termed as the regression
variance.
The diagonal elements of the variance-covariance matrix give us a measure of the sensitivity of the final output to
a specific input, keeping all else constant. For example, cell 2x2 in the 'variance_vect_temp_o1' data frame in the
iPython notebook gives us the sensitivity of the internal temperature to the external humidity, keeping all else
constant. The matrix, therefore, gives us a measure of causal inference, giving us an ordered list of the factors that
the internal temperature is most sensitive to. Our findings are as follows:
Temperature
Table 5: Variance-Covariance Matrix - Temperature
As is clear from the diagonal elements of the matrix above, the internal temperature's sensitivity to the
predictor variables is as follows:
Wind Speed
External Temperature
Tied between Solar Radiation / External Humidity
Humidity
35
Table 6: Variance-Covariance Matrix - Humidity
As is clear from the diagonal elements of the matrix above, the internal humidity's sensitivity to the predictor
variables is as follows. Correlation is positive unless indicated otherwise:
External Humidity
Solar Radiation (Negative Correlation)
Wind Speed
External Temperature
6 Interactive Data Analysis and Visualization Platform Visionary Scenario
The real-time indoor environmental conditions and HVAC operation status are crucial to the facility managers.
Consider a color map illustrating the temperature and humidity profile inside the Hunt Library, which always
provides a direct understanding of the environmental condition.
The visualization is also convenient for operation maintenance. When some out-of-ranges occur, the system will
notify the facility managers via message and email. He could fetch the data over two weeks, view the changes,
and quickly find where the problem is. Then he is interested in the operating variance in some winters. He could
display the concerning data in one graph, and view the difference.
In general, visualization framework is so helpful that facility managers can utilize it combined with HVAC
operation and energy consumption data to make informed decisions for optimization and maintenance.
Conceptual Design
Here we propose the design we have envisioned for the visualization application. An extensible framework is
planned to integrate the data analysis and visualization, which consists of an interactive GUI tool and underlying
interfaces and implementations, illustrated below.
36
Figure 26: Unified Modeling Language (UML) of Data Visualization.
Data plugins are responsible for extracting data from the database and making it available for processing.
Multiple data sources are available including local .csv files, html data, or trending data from SQL. Then this kind
of raw data is transferred to the framework for analysis.
Display plugins use the analyzed data (processes like data cleaning, training, and testing) within the framework
to plot various diagrams. Possible visualizations include temperature and humidity time history diagram,
temperature and humidity color mapping inside Hunt Library, and further extensible visualization. One
commonly used displaying method is time-series diagram. Users could directly discern the trending of data with
respect to time. In this platform, the plotting of variables for any time periods is supported with simple analysis
of the data imported from the dataset. A user-friendly communication of the data gathered through the
network would be a temperature and humidity map. A design is suggested below – this could be implemented
as a displaying plugin.
For each moment, the indoor temperature and humidity could be derived through the networking measuring
system or predicted via the machine learning algorithm at proposed positions inside the hunt library. Therefore,
the mapping between the variables and the coordinates could be acquired. If we make linear interpolation for
every two nodes, and connect the points with the same value of the variables, we could draw the contour line
and the grading colors illustrating the distribution of the variables within the area of Hunt Library, as shown in
Figure 3.
37
Figure 27: Color map showing temperature distribution overlaid on the floor plan.
Implementation
There are two choices for visual software developing. One is web application, which stored all the content in a
remote server. Users could get access to the information through the browser interface via internet, like
Facebook or Twitter. The web applications are usually lightweight and easily accessible, but the exposure of the
information is likely to cause some security-related problems, which requires constant time and effort for
maintenance. Another option is local implementation, which, for example, could be launched from a Python
script.
Due to the learning objective and time-limit, our team chose to create a relatively simple local application in
Python, which is commonly used for data analyzing, visualization, and building applications. There are several
Python modules which can be used to help develop a graphical user interfaces (GUI) for an application. The ones
we explore include Tkinter [52], wxPython [53], and TraitsUI [54]. For producing plots and graphics, Matplotlib
[55] and Chaco [56] are packages that become useful. Although a complete implementation of the application
was not achieved, a valuable result is a survey of the different options and their characteristics.
Tkinter is the most popular graphical user interface (GUI) for Python. It can be used to develop visual
applications with buttons, drop down menus, text boxes, and many other interactive features. It is included with
the installation of Python [52]. WxPython is similar to Tkinter, but must be installed as a new package. We
investigated it because it supports transparency for graphical objects which is desired in our application [53].
TraitsUI allows for a GUI to be produced in an easier fashion via preset objects [54]. Matplotlib graphics can be
integrated into an application for any of the above modules. Chaco can be more easily utilized in the TraitsUI
module than Matplotlib. Our recommendation for the simplest implementation would be using TraitsUI and
leverages Chaco as a graphic producer. A local application is certainly something that a facility manager or office
worker could launch on their machine and interface with. However, if more time is allowed for development, a
web-based and mobile friendly application would be most useful for the modern user.
38
7 Conclusion
This project started by exploring the points of pain users of Hunt Library experience with regard to heating and
cooling. These were identified defined through user interviews and building surveys. We determined that the
indoor conditions are not comfortable for occupants. Our solution approach starts with quantifying the spatial
and temporal profiles of indoor temperature and humidity at an appropriate resolution for building would
objectively assess the problem. Integrating this knowledge with information on outdoor conditions and
mechanical systems data would allow for prediction through modeling. Inference could also be performed to
determine what factors drive indoor conditions. This information could be integrated into a visualization
application, allowing a building manager to make data driven decisions. Possible useful outcomes include
identification of unacceptable indoor conditions and identification of poor system performance.
We then assessed technologies that could be used to take measurements of indoor temperature and humidity.
We made a final selection to use the Photon and an integrated temperature and humidity sensor based on
technical specifications, feasibility, cost, and learning objectives. Prototyping was done with a two node sensor
network that was able to record temperature and humidity for 9 days in the library. We then scaled the system
to 6 nodes and redesigned software for higher performance. We collected data for 11 days on the fourth floor of
the library. Graphical exploration of the resulting data showed that there are large ranges in temperature on the
fourth floor and that temperature are humidity follow trends of outdoor conditions. The sensor network we
designed features mobile senor nodes powered by AA batteries with a life of over a month. A data management
solution is produced that reliably collects and stores measurements.
An artificial neural network model was developed that is able to predict temperature and humidity for a given
node based on outdoor conditions. The prediction for humidity was more accurate than the prediction for
temperature. A framework was established for how mechanical systems data could be integrated to provide an
improved prediction. A regression model based on higher order statistics was implemented for inference on
causal factors as well as a second method of prediction.
A visualization application was conceptually designed. Such an application would provide graphics on indoor
conditions including plots of temperature and humidity for particular nodes as well as heat maps that could
show the spatial profile overlaid on a floorplan. It would integrate results from prediction and inference models.
A survey of methods for implementation was completed.
8 References
[1] "Wireless Sensor Tags," [Online]. Available: http://wirelesstag.net/.
39
[2] "Photon - Particle - Particle Store," [Online]. Available: https://store.particle.io/collections/photon.
[3] "Photon Datasheet - Particle Docs," [Online]. Available: https://docs.particle.io/photon/photon-
datasheet/.
[4] "SimpleLink Wi-Fi CC3200 LaunchPad - Texas Instruments," [Online]. Available:
http://www.ti.com/tool/cc3200-launchxl.
[5] "CC3200 SimpleLink, Wi-Fi Internet-of-Things Solution," [Online]. Available:
http://www.ti.com/lit/ds/symlink/cc3200.pdf.
[6] "CC3200R1M1RGCR - TI store," [Online]. Available: https://store.ti.com/CC3200R1M1RGCR.aspx.
[7] "ZigBee - Texas Instruments," [Online]. Available:
http://www.ti.com/lsds/ti/wireless_connectivity/zigbee/overview.page.
[8] "ZigBee Wireless Standard - Digi International," [Online]. Available:
http://www.digi.com/resources/standards-and-technologies/rfmodems/zigbee-wireless-standard.
[9] "IEEE 802.15.4 - Wikipedia, the free encyclopedia," [Online]. Available:
https://en.wikipedia.org/wiki/IEEE_802.15.4.
[10] "ZigBee® (IEEE 802.15.4 / ZigBee PRO | Power Calculator," [Online]. Available:
http://www.ti.com/lsds/ti/wireless_connectivity/zigbee/power_calculator.page.
[11] "CC2538 | ZigBee (IEEE 802.15.4 / ZigBee PRO) | Wireless," [Online]. Available:
http://www.ti.com/product/CC2538/samplebuy.
[12] "RF Transceiver Modules: Digi-International XB24-AWI-001," [Online]. Available:
http://www.digikey.com/product-detail/en/digi-international/XB24-AWI-001/XB24-AWI-001-ND/935965.
[13] "Building a Wireless Sensor Network in Your Home," [Online]. Available:
http://computers.tutsplus.com/tutorials/building-a-wireless-sensor-network-in-your-home--cms-19745.
[14] "Arduino - ArduinoBoardUno," [Online]. Available: https://www.arduino.cc/en/Main/ArduinoBoardUno.
[15] "Arduino UNO Rev3 – Arduino Store USA," [Online]. Available: http://store-
usa.arduino.cc/products/a000066.
[16] "Raspberry Pi 2 Model B Project Board - 1GB RAM - 900 MHz Quad-Core CPU," [Online]. Available:
http://www.amazon.com/Raspberry-Pi-Model-Project-
Board/dp/B00T2U7R7I/ref=sr_1_2?s=pc&ie=UTF8&qid=1455768551&sr=1-2&keywords=raspberry+pi.
[17] "4-Step Guide to Choosing the Right Temperature Sensor," [Online]. Available:
http://www.dataloggerinc.com/content/resources/data_logging_tutorials/295/4-
step_guide_to_choosing_the_right_temperature_sensor/.
40
[18] "Thermocouples - OMEGA Engineering," [Online]. Available:
http://www.omega.com/prodinfo/thermocouples.html.
[19] "uxcell K Type -50-700C Thermocouple Probe Temperature," [Online]. Available:
http://www.amazon.com/uxcell-50-700C-Thermocouple-Temperature-Sensor/dp/B00D8337YW.
[20] "Capgo - Semiconductor Temperature Sensors," [Online]. Available:
http://www.capgo.com/Resources/Temperature/Semiconductor/Semi.html.
[21] "Specifying RTDs - Smart Sensors Inc.," [Online]. Available: http://www.smartsensors.com/specrtds.pdf.
[22] "RTD Specifications.," [Online]. Available: http://www.omega.com/Temperature/pdf/RTDSpecs_Ref.pdf.
[23] "AGPtek Stainless Steel PT100 RTD Thermistor Sensor Probe.," [Online]. Available:
http://www.amazon.com/AGPtek-Stainless-Thermistor-Sensor-Temperature/dp/B008YP1D04.
[24] "Amazon.com: Liquid tight RTD sensor, 34 mm probe, 1/8 NPT Thread," [Online]. Available:
http://www.amazon.com/Liquid-tight-sensor-probe-Thread/dp/B00BFF843O.
[25] "caldera spa temperature sensor thermistor ewgx272 #71578," [Online]. Available:
http://www.calderaspapartsplus.com/caldera-spa-temperature-sensor-thermistor-ewgx272-71578/.
[26] "10K Thermistor Sensors - TempSensing.com," [Online]. Available:
https://www.tempsensing.com/zc/index.php?main_page=index&cPath=4_5_9.
[27] "Choosing a Humidity Sensor: A Review of Three Technologies," [Online]. Available:
http://www.sensorsmag.com/sensors/humidity-moisture/choosing-a-humidity-sensor-a-review-three-
technologies-840.
[28] "Particle Store," [Online]. Available: https://store.particle.io/.
[29] "Wireless Sensor Tags," [Online]. Available: http://wirelesstag.net/.
[30] "CC3200 SimpleLink™ Wi-Fi® and Internet-of-Things solution, a Single-Chip Wireless MCU," [Online].
Available: http://www.ti.com/product/CC3200.
[31] "ARDUINO UNO REV3," [Online]. Available: http://store-usa.arduino.cc/products/a000066.
[32] "What Is Zigbee?," [Online]. Available: http://www.zigbee.org/.
[33] "Digital relative humidity & temperature sensor RHT03," [Online]. Available:
http://cdn.sparkfun.com/datasheets/Sensors/Weather/RHT03.pdf.
[34] "ZEEFO 2 Pack Wall Charger," [Online]. Available: http://www.amazon.com/ZEEFO-Charger-Quality-
Adapter-Samsung/dp/B00V7N3AMO/ref=sr_1_3?ie=UTF8&qid=1456174037&sr=8-
3&keywords=single+usb+charger.
41
[35] "AmazonBasics USB 2.0 Extension Cable," [Online]. Available: http://www.amazon.com/AmazonBasics-
Extension-Cable--Male--Female/dp/B00NH11PEY/ref=sr_1_5?s=pc&ie=UTF8&qid=1456174081&sr=1-
5&keywords=usb+extension+cable.
[36] "AmazonBasics Micro-USB to USB Cable 2-Pack - 3-Feet," [Online]. Available:
http://www.amazon.com/AmazonBasics-Micro-USB-USB-Cable-2-
Pack/dp/B00NH13O7K/ref=sr_1_6?ie=UTF8&qid=1459129379&sr=8-6&keywords=micro+USB+cable.
[37] "SparkFun Photon Battery Shield," [Online]. Available: https://www.sparkfun.com/products/13626.
[38] "250 mA Low Quiescent Current LDO Regulator," [Online]. Available:
http://ww1.microchip.com/downloads/en/DeviceDoc/22008E.pdf.
[39] "Discharge tests of AA Batteries, Alkaline and NiMH," [Online]. Available:
http://www.powerstream.com/AA-tests.htm.
[40] "Find the energy contained in standard battery sizes," [Online]. Available:
http://www.allaboutbatteries.com/Energy-tables.html.
[41] "ENERGIZER E91 Product Datasheet," [Online]. Available: http://data.energizer.com/PDFs/E91.pdf.
[42] "uxcell 2 Pcs 4 x AA 6V Battery Holder Case," [Online]. Available:
http://www.amazon.com/dp/B00HR93NJM/ref=sr_ph?ie=UTF8&qid=1459129674&sr=1&keywords=4+AA
+battery+holder.
[43] "AmazonBasics AA Performance Alkaline Batteries (48-Pack)," [Online]. Available:
http://www.amazon.com/AmazonBasics-Performance-Alkaline-Batteries-48-
Pack/dp/B00MNV8E0C/ref=pd_sim_sbs_21_1?ie=UTF8&dpID=61G-
GoYTzqL&dpSrc=sims&preST=_AC_UL160_SR160%2C160_&refRID=04KJF5H2VQPC3D4R2ZAZ.
[44] "Microchip Technology MCP1702-4002E/TO," [Online]. Available: http://www.digikey.com/product-
detail/en/microchip-technology/MCP1702-4002E%2FTO/MCP1702-4002E%2FTO-ND.
[45] "SPARKFUNRHT03," [Online]. Available:
https://build.particle.io/libs/55f712da4121e67e380006d6/tab/RHT03-Example-Serial.ino.
[46] "SparkFun Inventor's Kit for Photon Experiment Guide," [Online]. Available:
https://learn.sparkfun.com/tutorials/sparkfun-inventors-kit-for-photon-experiment-guide/experiment-6-
environment-monitor.
[47] "AET Insights," Accredited Environmental Technologies, Inc. , 2010. [Online]. Available:
http://aetinc.biz/newsletters/2010-insights/march-2010.
[48] J. Yang, "Building energy prediction with adaptive artificial neural networks," in Datta, D. S. (2000).
Application of neural networks for the prediction of the energy consumption in a supermarket. Proceedings
of the International Conference CLIMA, (pp. 98-107)., Montréal, 2005.
42
[49] D. S. Datta, "supermarket, Application of neural networks for the prediction of the energy consumption in
a," in Proceedings of the International Conference CLIMA, 2000.
[50] J. Salatas, "Implementation of Elman Recurrent Neural Network in WEKA," 10 September 2011. [Online].
Available: http://jsalatas.ictpro.gr/implementation-of-elman-recurrent-neural-network-in-weka/.
[51] I. S. G. E. H. Alex Krizhevsky, "ImageNet Classification with Deep Convolutional Neural Networks," [Online].
Available: http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf. [Accessed 06 May 2016].
[52] "Tkinter," [Online]. Available: https://wiki.python.org/moin/TkInter.
[53] "wxPython," [Online]. Available: http://www.wxpython.org/.
[54] "TraitsUI," ENTHOUGHT, [Online]. Available: http://code.enthought.com/projects/traits_ui/.
[55] "matplotlib," [Online]. Available: http://matplotlib.org/.
[56] "Chaco," ENTHOUGHT, [Online]. Available: http://code.enthought.com/projects/chaco/.
[57] "Amazon.com: Teenitor 3.7V 600mAh 20C Lipo Battery," [Online]. Available:
http://www.amazon.com/Teenitor-600mAh-Battery-Charger-Parts/dp/B00LK0DY3O.
[58] "SparkFun Photon Battery Shield - DEV-13626 - SparkFun," [Online]. Available:
https://www.sparkfun.com/products/13626.
[59] "Solderless Plug-in BreadBoard, 830 tie-points, 2 Power lanes.," [Online]. Available:
http://www.amazon.com/Solderless-Plug-BreadBoard-tie-points-200PTS/dp/B005GYATUG.
[60] "iXCC 10ft 2pc Long USB2.0 - MicroUSB to USB Cable, A Male to Micro B Charge and Sync Cord For
Android/Samsung/Windows/MP3/Camera and other Device," [Online]. Available:
http://www.amazon.com/iXCC-10ft-2pc-Long-USB2-0/dp/B00DYWC0BI.
[61] "Amazon.com: MYT ® wall charger power supply adapter DC," [Online]. Available:
http://www.amazon.com/MYT-charger-supply-adapter-Barrel/dp/B00ZOHBE6I.
[62] "Bluetooth - Wikipedia, the free encyclopedia," [Online]. Available:
https://en.wikipedia.org/wiki/Bluetooth.
[63] "Building a Wireless Sensor Network in Your Home," [Online]. Available:
http://computers.tutsplus.com/tutorials/building-a-wireless-sensor-network-in-your-home--cms-19745.
[64] "Xively by LogMeIn," [Online]. Available: https://xively.com/.
[65] "Bluetooth® 4.0 Low Energy Single Mode Smart Sensors," [Online]. Available:
http://www.blueradios.com/hardware_sensors.htm.
[66] W. D. Jankowski, "Survey of Neural Transfer Functions," [Online]. Available:
ftp://ftp.icsi.berkeley.edu/pub/ai/jagota/vol2_6.pdf. [Accessed 4 April 2016].
43
[67] "Board Mount Temperature Sensors TEMP SENS, RTD platnum-clad NI wire (1 piece)," [Online]. Available:
http://www.amazon.com/Board-Mount-Temperature-Sensors-platnum-
clad/dp/B005T9O81O/ref=sr_1_28?s=industrial&ie=UTF8&qid=1455748360&sr=1-
28&keywords=rtd+temperature+sensor.
[68] "Humidity sensor-AM1001 AOSONG - 1 / 5 Pages," [Online]. Available:
http://pdf.directindustry.com/pdf/aosong-electronics-co-ltd/humidity-sensor-am1001-aosong/121567-
472713.html.
[69] "HDC1080 Low Power, High Accuracy Digital Humidity Sensor with Temperature Sensor," [Online].
Available: http://www.ti.com/product/HDC1080.
[70] "Product Datasheet - Energized E95," [Online]. Available: http://data.energizer.com/PDFs/e95.pdf.
[71] I. S. G. E. H. Alex Krizhevsky, "ImageNet Classification with Deep Convolutional Neural Networks," [Online].
Available: http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf. [Accessed 06 May 2016].
9 Appendix
9.1 Sensor Node Software: Photon C++ Script Below we show the code for Node 1. The rest of the 6 nodes are the same, except that any number identifier for
node 1 is changed to the corresponding node.
// Flash this code to Node 1
// Node one device ID: 32001d001347343432313031
// Include the library for RHT03 - digital sensor
#include "SparkFunRHT03/SparkFunRHT03.h"
// Define the digital pin which receives data from RHT03
const int RHT_SENSOR_PIN = D1;
// Create variables to hold temperature & humidity
float temp;
float hum;
String data_str;
// Create object from which attributes to get temperature and humidity can be used
RHT03 rht;
void setup() {
RGB.brightness(5); // set the LED brightness to low
rht.begin(RHT_SENSOR_PIN); // initiate sensor
}
void loop() {
// Update values which will be saved at each instant on cloud
int update=rht.update();
// if update fails, try again repeatly until success
while(update!=1){
delay(1100); // allow time to reset
update=rht.update();
}
// put the data into variables
temp = rht.tempF();
hum = rht.humidity();
// convert to string
data_str="temp1:"+String(temp,2)+"humd1:"+String(hum,2);
// publish data to 12745 event
Particle.publish("12745_data",data_str,600);
// delay 4.5 seconds
delay(4500);
// sleep for 10 minutes - 4.5 sec
System.sleep(SLEEP_MODE_DEEP, 595.5);
}
1
9.2 Raspberry Pi: Logger Python Script
######################################
### Data Logging Script ###
### AIS Project, 2016 ###
### Hunt Library Heating & Cooling ###
### Joseph Dryer & Alex Shu ###
######################################
###############################
### IMPORT REQUIRED MODULES ###
###############################
import Queue, threading, time, datetime, csv
from pytz import timezone
from sseclient import SSEClient #https://pypi.python.org/pypi/sseclient/0.0.8
###############################
### DEFINE FUNCTIONS ###
###############################
### FUNCTION TO RETRIEVE THE CURRENT TIME ###
def get_time():
# Return the curent time as a string in format "yyyy/mm/dd hh:mm AM/PM"
return datetime.datetime.now(timezone('US/Eastern'))\
.replace(tzinfo=None).strftime("%Y-%m-%d %I:%M %p")
### FUNCTION TO SETUP CSV FILES FOR WRITING ###
def init_csv(node):
'''
Creates a CSV file for a node's data to be stored in.
'''
# Create a CSV file with the node number and current date embedded in the file
name
date=datetime.datetime.now(timezone('US/Eastern')).replace(tzinfo=None).date()
file_name='datafile'+str(date)+"-node"+str(node)+'.csv'
# create/open the file
file=open(file_name,'wb')
write=csv.writer(file)
# string identifier for time, temperature, humidity columns with node number
time,t,h='time'+str(node),'t'+str(node),'h'+str(node)
# add the colum headers
write.writerow([time,t,h])
file.close()
# return the file name for later use
return file_name
####################################
### DEFINE CLASSES / THREADS ###
####################################
### LISTENER - LISTENS FOR SSEs, PUTS DATA IN QUEUE
class listener(threading.Thread):
def
__init__(self,group=None,target=None,name=None,args=(),kwargs=None,verbose=None):
super(listener,self).__init__()
self.target=target
2
self.name=name
print "Initializing Listener..."
def run(self):
events=SSEClient(url)
# pre-set these to false - when data comes in, the values will be set
t,h=False,False
for event in events:
# convert unicode to string
#parse to appropriate indices to find desired value
data=event.data.encode('ascii','ignore')[9:31]
if data=="": # if there is no data (ghost event)
continue # skip the rest of the iteration of loop
if data[0:4]=='temp': # check that it's our event, or the below will
make errors
temp_data=data[0:11] # this is the data for temperature e.g.
temp1:70.02
humd_data=data[11:] # this is the data for humidity e.g.
humd1:40.50
node=int(temp_data[4]) # get node number by parsing name string
t=temp_data[6:] # get value
h=humd_data[6:] # get value
now=get_time() # get time
q.put((node,now,t,h)) # put the data in the queue as a tuple
global event_time # set event_time to global variable to update it
event_time=time.time() # record the event time (for the timer
thread)
### LOGGER - PULLS FROM THE QUEUE AND SAVES DATA
class logger(threading.Thread):
def
__init__(self,group=None,target=None,name=None,args=(),kwargs=None,verbose=None):
super(logger,self).__init__()
self.target=target
self.name=name
print "Initializing Logger..."
def run(self):
while True:
if not q.empty(): # if the queue has data in it...
data=q.get() # grab a piece of data
# decode the tuple
node,time_stamp,temp,hum = data[0],data[1],data[2],data[3]
# find the file name
file_name=file_names[node-1]
file=open(file_name,'a') #open the file
write=csv.writer(file)
row=[time_stamp,temp,hum]
write.writerow(row) # write to the file
print [node]+row # print the data
file.close() # close the file
time.sleep(30) # sleep 30 seconds... avoid using all the CPU!
return
### TIMER - RESTARTS THE LISTENER IF IT HANGS ###
class timer(threading.Thread):
def
__init__(self,group=None,target=None,name=None,args=(),kwargs=None,verbose=None):
super(timer,self).__init__()
3
self.target=target
self.name=name
print "Initializing Timer..."
def run(self):
while True:
global event_time
# if 2 hours of no data, restart listener
if (time.time()-event_time)>2*60*60:
# reinstantiate listener
li=listener(name='listener')
li.start()
# restart timer
event_time=time.time()
time.sleep(60*60) # sleep for an hour (you don't want this to run
constantly)
return
###############################
### MAIN ###
###############################
if __name__ == '__main__':
### INITIALIZE FILES AND FILE NAMES
num_nodes=6 # set the number of nodes
file_names=[] # list for file names for each node
for node in range(num_nodes):
# create files for each node
# add each file name to the list
file_names.append(init_csv(node+1))
print file_names
### SET INITIAL EVENT TIME TO NOW
global event_time
event_time=time.time()
### DEFINE URLs TO CLOUD EVENT ###
url='https://api.particle.io/v1/events/12745_data?access_token=cba096a0629d2c9609ee
ff7f662588b4b32e8218'
### CREATE A QUEUE TO PASS DATA BETWEEN THREADS ###
q=Queue.Queue(0) # no size limit
### CREATES INSTANCES OF THE THREADS
li=listener(name='listener')
lo=logger(name='logger')
ti=timer(name='timer')
### INITIALIZE THE THREADS
li.start()
lo.start()
ti.start()
4
9.3 Raspberry Pi Dropbox Backup Python Script In addition to the below python script, the Dropbox-Uploader shell script must be downloaded (follow the instructions in the source mentioned in the report body) such that it can be called from the python script. Here’s the link to Github: https://github.com/andreafabrizi/Dropbox-Uploader
from subprocess import call
import time
import pandas as pd
while True:
file="/home/pi/Dropbox-Uploader/dropbox_uploader.sh upload
/home/pi/photon/logger logger"
call([file], shell=True)
time.sleep(600*6*3)
9.4 Group Account Information for Gmail, Dropbox, Particle Build The username and password for the Gmail, Dropbox account, and Particle Build account for the group are as
follows:
Username: [email protected]
Password: huntlibrary
The Particle Build account will give access to the code currently on the Photon as well allow over the air updates.
To flash code to the Photons, they must be turned off and then turned on while holding the SETUP button. The
LED will flash magenta – let go. Then the Photon will be in cloud mode and will be ready to receive over-the-air
code.
9.5 Ipython Notebook for Plotting Data The Ipython Notebook is attached as a pdf in the pages to follow.
9.6 Data Analysis Ipython Notebook The Ipython Notebook is attached as a pdf in the pages to follow.
Ipython Notebook - Plotting Temperature and Humidity Data
Import Required Packages
In [1]: import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
import datetime as datetime
import numpy as np
%matplotlib inline
Two Node System
In [2]: df2=pd.read_csv("datafile2016-04-07.csv", sep=’,’)
In [3]: df2[’time’]=pd.to_datetime(df2[’time’],format="%Y-%m-%dT%H:%M:%S")+pd.DateOffset(hours=-4)
In [4]: df2=df2.set_index(’time’)
In [5]: df2=df2.convert_objects(convert_numeric=True).dropna();
C:\Users\jdryer\Anaconda2\lib\site-packages\ipykernel\ main .py:1: FutureWarning: convert objects is deprecated. Use the data-type specific converters pd.to datetime, pd.to timedelta and pd.to numeric.
if name == ’ main ’:
In [6]: plot=df2[[’t1’,’t2’]].plot(style=’.’,fontsize=14,rot=45,\
ylim=(75,84),use_index=True,figsize=(15,5),grid=True)
plot.set_ylabel(’Temperature (F)’)
plot.set_xlabel(’Time Stamp’)
plot.xaxis.set_major_formatter(dates.DateFormatter(’%m/%d’))
plot.legend([’Node 1’,’Node 2’],loc=’lower right’,ncol=2)
Out[6]: <matplotlib.legend.Legend at 0x9363908>
In [7]: plot=df2[[’r1’,’h2’]].plot(style=’.’,fontsize=14,rot=45,ylim=(15,42),use_index=True,figsize=(15,5),grid=True)
plot.set_ylabel(’Relative Humidity (%)’)
plot.set_xlabel(’Time Stamp’)
plot.xaxis.set_major_formatter(dates.DateFormatter(’%m/%d’))
plot.legend([’Node 1’,’Node 2’],loc=’upper right’,ncol=2)
1
Out[7]: <matplotlib.legend.Legend at 0x65ed9b0>
In [8]: df2=df2.resample(’10T’,how=’mean’)
In [9]: df2.columns=(’t1’,’h1’,’t2’,’h2’)
In [10]: df2.head()
Out[10]: t1 h1 t2 h2
time
2016-04-06 21:00:00 78.98 21.0 76.64 21.65
2016-04-06 21:10:00 78.98 21.4 76.64 21.95
2016-04-06 21:20:00 78.98 21.5 76.64 22.00
2016-04-06 21:30:00 78.98 21.7 76.64 22.20
2016-04-06 21:40:00 78.98 21.7 76.64 22.60
Six Node System
In [11]: df6=pd.read_excel(’6 Node System Data.xlsx’,sheetname=None)
In [12]: node_1=df6[’Node 1’].set_index(’time1’)
node_2=df6[’Node 2’].set_index(’time2’)
node_3=df6[’Node 3’].set_index(’time3’)
node_4=df6[’Node 4’].set_index(’time4’)
node_5=df6[’Node 5’].set_index(’time5’)
node_6=df6[’Node 6’].set_index(’time6’)
In [13]: ax1=node_1.t1.plot(rot=45,fontsize=14,ylim=(75,83),style=’.’,figsize=(15,5))
node_2.t2.plot(ax=ax1,style=’.’)
node_3.t3.plot(ax=ax1,style=’.’)
node_4.t4.plot(ax=ax1,style=’.’)
node_5.t5.plot(ax=ax1,style=’.’)
node_6.t6.plot(ax=ax1,style=’.’)
ax1.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,\
’Node 5’,’Node 6’],loc=’lower center’,ncol=6)
ax1.xaxis.set_major_formatter(dates.DateFormatter(’%m/%d’))
ax1.set_ylabel(’Temperature (F)’,fontsize=14)
ax1.set_xlabel(’Date’,fontsize=14)
ticks=pd.Series(node_3.index).map(pd.Timestamp.date).unique()
plt.xticks(ticks[1:]);
2
In [14]: ax2=node_1.h1.plot(rot=45,fontsize=14,ylim=(17,55),style=’.’,figsize=(15,5))
node_2.h2.plot(ax=ax2,style=’.’)
node_3.h3.plot(ax=ax2,style=’.’)
node_4.h4.plot(ax=ax2,style=’.’)
node_5.h5.plot(ax=ax2,style=’.’)
node_6.h6.plot(ax=ax2,style=’.’)
ax2.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,’Node 5’,’Node 6’])
ax2.legend(loc=’lower right’,ncol=6)
ax2.set_ylabel(’Relative Humidity’,fontsize=14)
ax2.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,’Node 5’,’Node 6’],loc=’lower center’,ncol=6)
ax2.xaxis.set_major_formatter(dates.DateFormatter(’%m/%d’))
ax2.set_ylabel(’Relative Humidity (%)’,fontsize=14)
plt.xticks(pd.Series(node_3.index).map(pd.Timestamp.date).unique()[1:]);
ax2.set_xlabel(’Date’,fontsize=14)
Out[14]: <matplotlib.text.Text at 0x9f59ef0>
In [15]: node_1=node_1.resample(’10T’,how=’mean’)
node_2=node_2.resample(’10T’,how=’mean’)
node_3=node_3.resample(’10T’,how=’mean’)
node_4=node_4.resample(’10T’,how=’mean’)
node_5=node_5.resample(’10T’,how=’mean’)
node_6=node_6.resample(’10T’,how=’mean’)
3
In [16]: df6=pd.concat([node_1,node_2,node_3,node_4,node_5,node_6],axis=1)
In [17]: df_i=df6.append(df2)
Comparing to Outdoor Conditions
In [18]: df_o=pd.read_csv(’WeatherData_1.csv’)
In [19]: df_o.columns=[’time’,’rad’,’hum’,’temp’,’wind’]
In [20]: df_o[’time’]=pd.to_datetime(df_o[’time’],format="%m/%d/%Y %H:%M")
In [21]: df_o=df_o.set_index(’time’).convert_objects(convert_numeric=True).dropna()
C:\Users\jdryer\Anaconda2\lib\site-packages\ipykernel\ main .py:1: FutureWarning: convert objects is deprecated. Use the data-type specific converters pd.to datetime, pd.to timedelta and pd.to numeric.
if name == ’ main ’:
In [22]: df_o=df_o.resample(’10T’,how=’mean’)
In [23]: df_all=pd.concat([df_i,df_o],axis=1)
In [24]: ax=df_all[[’t1’,’t2’,’t3’,’t4’,’t5’,’t6’,’temp’]].plot(figsize=(15,5),style=’.’)
ax.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,’Node 5’,’Node 6’,\
’Outdoor Temperature’],loc=’upper right’,ncol=7)
ax.set_ylabel(’Temperature (F)’,fontsize=14)
ax.set_xlabel(’Date’,fontsize=14)
Out[24]: <matplotlib.text.Text at 0x9a452b0>
In [25]: ax=df_all[[’h1’,’h2’,’h3’,’h4’,’h5’,’h6’,’hum’]].plot(figsize=(15,5),style=’.’)
ax.legend([’Node 1’,’Node 2’,’Node 3’,’Node 4’,’Node 5’,’Node 6’,\
’Outdoor Humidity’],loc=’lower right’,ncol=7)
ax.set_ylabel(’Relative Humidity (%)’,fontsize=14)
ax.set_xlabel(’Date’,fontsize=14)
Out[25]: <matplotlib.text.Text at 0xbd6a6d8>
4
Data Analysis: Modeling for Prediction and Inference
May 6, 2016
Node 1: Data Import, ANN, L.Reg, and Statistical Inference
Data Import and Cleaning
First, we import libraries that we will need to carry out the data analysis. Libraries will also be importedfurther into the code as the need comes up.
In [1]: %matplotlib inline
import numpy as np
import pandas as pd
from scipy import stats
from scipy import linalg
import matplotlib
import matplotlib.pyplot as plt
pd.set_option(’display.mpl_style’, ’default’)
pd.options.display.max_rows =5000
We import all the nodal data from the sensor network collected over the last three weeks using the xlrdlibrary.
In [2]: import xlrd
xl = pd.ExcelFile("6 Node System Data.xlsx")
sensordata_pd = xl.parse("Node 1")
#Imports data from the node 1 sheet.
Next, we convert strings of imported parameter values to floating points using the convert objects()method.
In [3]: sensordata_pd=sensordata_pd.convert_objects(convert_numeric=True)
sensordata_pd.head()
Out[3]: time1 t1 h1
0 2016-04-22 23:30:00 78.44 36.2
1 2016-04-22 23:40:00 79.88 36.4
2 2016-04-22 23:50:00 79.70 36.4
3 2016-04-23 00:00:00 79.88 36.3
4 2016-04-23 00:10:00 79.88 36.2
Conversion of the datetime index to a numpy.datetime datatype is essential for resampling and databaseoperations performed later on in this notebook. We convert using the to datetime method in the Pandasdocumentation.
In [4]: sensordata_pd[’time1’]=pd.to_datetime(sensordata_pd.time1.values,\
format=’%Y-%m-%d %H:%M:%S.%f’)
#Converts time1 to a timestamp from a string
sensordata_pd.head()
1
Out[4]: time1 t1 h1
0 2016-04-22 23:30:00 78.44 36.2
1 2016-04-22 23:40:00 79.88 36.4
2 2016-04-22 23:50:00 79.70 36.4
3 2016-04-23 00:00:00 79.88 36.3
4 2016-04-23 00:10:00 79.88 36.2
In [5]: type(sensordata_pd.time1.values[0])
#Timestamp conversion in place
Out[5]: numpy.datetime64
We rename ‘t1’ and ‘h1’ to make them more readable.
In [6]: sensordata_pd.rename(columns={’time1’: ’Time_Stamp’, \
’t1’: ’Temp_1’,’h1’: ’Hum_1’}, inplace=True)
#Renames data variables
sensordata_pd.head()
Out[6]: Time Stamp Temp 1 Hum 1
0 2016-04-22 23:30:00 78.44 36.2
1 2016-04-22 23:40:00 79.88 36.4
2 2016-04-22 23:50:00 79.70 36.4
3 2016-04-23 00:00:00 79.88 36.3
4 2016-04-23 00:10:00 79.88 36.2
As mentioned previously, it is essential for the timestamp to be set as the index of the dataframe, giventhe need to perform resampling and database joins.
In [7]: sensordata_pd=sensordata_pd.set_index(’Time_Stamp’)
#Makes timestamp the index
sensordata_pd.head()
Out[7]: Temp 1 Hum 1
Time Stamp
2016-04-22 23:30:00 78.44 36.2
2016-04-22 23:40:00 79.88 36.4
2016-04-22 23:50:00 79.70 36.4
2016-04-23 00:00:00 79.88 36.3
2016-04-23 00:10:00 79.88 36.2
Prior to resampling, we ensure that any unsampled datapoints and their corresponding time indices areremoved from the dataframe.
In [8]: sensordata_pd= sensordata_pd.dropna(axis=0, how=’any’,\
thresh=None, subset=None, inplace=False)
#Drops nan/nat values in the input data
sensordata_pd.head()
Out[8]: Temp 1 Hum 1
Time Stamp
2016-04-22 23:30:00 78.44 36.2
2016-04-22 23:40:00 79.88 36.4
2016-04-22 23:50:00 79.70 36.4
2016-04-23 00:00:00 79.88 36.3
2016-04-23 00:10:00 79.88 36.2
2
In [9]: type(sensordata_pd.index.values[0])
#Confirms that the index dtype is datetime
Out[9]: numpy.datetime64
In [10]: sensordata_pd.shape
#Shape of the dataset subsequent to removal of nan values
Out[10]: (1480, 2)
Resampling the data is a method to ensure that the data acquisition timestamp instances align with asubset of timestamp values for data collected from the Centre for Building Performance and Diagnostics.This would also, in the future, help alignment with internal HVAC system data from Hunt Library.
In [11]: sensordata_pd=sensordata_pd.resample(’10T’,how=’mean’)
#Resamples to align timestamps with 10 minute intervals by
#taking means of intermediate variables
sensordata_pd.head()
Out[11]: Temp 1 Hum 1
Time Stamp
2016-04-22 23:30:00 78.44 36.2
2016-04-22 23:40:00 79.88 36.4
2016-04-22 23:50:00 79.70 36.4
2016-04-23 00:00:00 79.88 36.3
2016-04-23 00:10:00 79.88 36.2
In [12]: sensordata_pd.shape
#Checks shape of resampled dataframe. Notice the change in shape.
Out[12]: (1671, 2)
Again, we drop any null values in the resampled dataframe.
In [13]: sensordata_pd= sensordata_pd.dropna(axis=0, how=’any’, \
thresh=None, subset=None, inplace=False)
#Removes nan/nat values from the resampled data (Precautionary)
sensordata_pd.head()
Out[13]: Temp 1 Hum 1
Time Stamp
2016-04-22 23:30:00 78.44 36.2
2016-04-22 23:40:00 79.88 36.4
2016-04-22 23:50:00 79.70 36.4
2016-04-23 00:00:00 79.88 36.3
2016-04-23 00:10:00 79.88 36.2
In [14]: sensordata_pd.shape #Shape of the cleaned resampled data
Out[14]: (1446, 2)
Now, we visualize the temperature and humidity time series, as observed at Node 1.
In [15]: plot_1=sensordata_pd.plot(figsize=[20,8])
#Graphs all the variables with respect to time
3
Outlier Removal
Next, we seek to isolate and remove data points that are outliers, and not representative of the range ofvalues that you would expect to see for these variables. These outliers could be a result of faulty collectionequipment or adversity in external factors that would cause a spike in temperature or humidity. A fire,for example, would cause an outlier in the observed data. To do this, we assume that, asymptotically, thetemperature and humidity readings converge on a normal distribution. Given the characteristics of a normaldistribution, it is improbable that any accurate readings over 3 standard deviations from the mean occur.Therefore, we remove all such data points.
In [16]: sensordata_pd=sensordata_pd[(np.abs(stats.zscore(sensordata_pd)) < 3).all(axis=1)]
#Computing the Z-Statistic and removing elements with Z-Score greater than 3
sensordata_pd.shape
Out[16]: (1430, 2)
Underlying Z-Statistic Method with an example:df = pd.DataFrame(np.random.randn(100, 3), columns=list(‘ABC’))df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 3).all(axis=1)]To filter the DataFrame where only ONE column (e.g. ‘B’) is within three standard deviations:df[((df.B - df.B.mean()) / df.B.std()).abs() < 3] OR try to replace the all method with any
In [17]: sensordata_pd= sensordata_pd.dropna(axis=0,\
how=’any’, thresh=None, subset=None, inplace=False)
#Drops any Nan Values after removal of outliers (Precautionary/Sanity Check)
sensordata_pd.shape
Out[17]: (1430, 2)
Now, we go back and visualize the temperature and humidity time series, as observed at Node 1. Noticehow the outliers observed in plot 1 are no longer visible here.
In [18]: plot_2=sensordata_pd.plot(figsize=[20,8])
4
Input Data
Next, we import external data collected by the Centre for Building Performance and Diagnostics from thetime period from the start of March 2016, and through till the afternoon of May 4th, 2016. The procedureto import this data and process it is identical to the one described above. Therefore, we will not go into adetailed description.
In [19]: Campus_Data=pd.read_csv(’WeatherData_1.csv’)
#Imports Data
Campus_Data.head()
Out[19]: Unnamed: 0 Global Solar Radiation Humidity Temperature Wind Speed
0 NaN W/m2 % F mph
1 3/1/16 0:00 1.0 63.3 43.7 1.7
2 3/1/16 0:05 1.0 63.5 43.7 0.0
3 3/1/16 0:10 1.0 63.9 43.5 0.0
4 3/1/16 0:15 1.0 63.0 43.8 1.7
In [20]: Campus_Data=Campus_Data.convert_objects(convert_numeric=True)
In [21]: type(Campus_Data.Humidity.values[0])
Out[21]: numpy.float64
In [22]: Campus_Data.rename(columns={’Unnamed: 0’:\
’TimeStamp’, ’Global Solar Radiation’:\
’Solar_Radiation’,’Humidity’: ’Humidity’,\
’Temperature’: ’Temperature’,\
’Wind Speed’: ’Wind_Speed’}, inplace=True)
Campus_Data.head()
Out[22]: TimeStamp Solar Radiation Humidity Temperature Wind Speed
0 NaN NaN NaN NaN NaN
1 3/1/16 0:00 1 63.3 43.7 1.7
2 3/1/16 0:05 1 63.5 43.7 0.0
3 3/1/16 0:10 1 63.9 43.5 0.0
4 3/1/16 0:15 1 63.0 43.8 1.7
5
In [23]: Campus_Data.shape
Out[23]: (18614, 5)
In [24]: Campus_Data= Campus_Data.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False)
Campus_Data.head()
Out[24]: TimeStamp Solar Radiation Humidity Temperature Wind Speed
1 3/1/16 0:00 1 63.3 43.7 1.7
2 3/1/16 0:05 1 63.5 43.7 0.0
3 3/1/16 0:10 1 63.9 43.5 0.0
4 3/1/16 0:15 1 63.0 43.8 1.7
5 3/1/16 0:20 1 63.0 43.7 0.6
In [25]: Campus_Data.shape
Out[25]: (18593, 5)
In [26]: Campus_Data.TimeStamp.values
Out[26]: array([’3/1/16 0:00’, ’3/1/16 0:05’, ’3/1/16 0:10’, ..., ’5/4/16 14:50’,
’5/4/16 14:55’, ’5/4/16 15:00’], dtype=object)
In [27]: Campus_Data[’TimeStamp’]=pd.to_datetime(Campus_Data.TimeStamp.values\
,infer_datetime_format=True, format=’%m/%d/%Y %H:%M’)
type(Campus_Data.TimeStamp.values[1])
Out[27]: numpy.datetime64
In [28]: Campus_Data=Campus_Data.set_index(’TimeStamp’)
#Makes timestamp the index
Campus_Data.head()
Out[28]: Solar Radiation Humidity Temperature Wind Speed
TimeStamp
2016-03-01 00:00:00 1 63.3 43.7 1.7
2016-03-01 00:05:00 1 63.5 43.7 0.0
2016-03-01 00:10:00 1 63.9 43.5 0.0
2016-03-01 00:15:00 1 63.0 43.8 1.7
2016-03-01 00:20:00 1 63.0 43.7 0.6
In [29]: Campus_Data=Campus_Data.resample(’10T’,how=’mean’)
#Resamples to align timestamps with 10 minute intervals
#by taking means of intermediate variables
Campus_Data.head()
Out[29]: Solar Radiation Humidity Temperature Wind Speed
TimeStamp
2016-03-01 00:00:00 1 63.40 43.70 0.85
2016-03-01 00:10:00 1 63.45 43.65 0.85
2016-03-01 00:20:00 1 62.95 43.70 0.85
2016-03-01 00:30:00 1 62.60 43.80 1.70
2016-03-01 00:40:00 1 63.00 43.50 0.85
In [30]: plot_3=Campus_Data.plot(figsize=[20,8])
6
Merging Data
Moving closer to implementing the ANN and linear regression, we now proceed to merge the two data sets:One, the data for internal temperature and humidity that is collected from the sensor network in Hunt.Two, the external temperature, humidity, windspeed and solar radiation data collected from the Centrefor Building Performance and Diagnostics. We do this by carrying out an outer database join, which onlycombines instances from the two dataframes that consist of data from the same instant in time.
In [31]: merged_matrix=Campus_Data.join(sensordata_pd, how=’inner’, rsuffix=’Time_Stamp’)
#Performs outer join
merged_matrix.head()
Out[31]: Solar Radiation Humidity Temperature Wind Speed \2016-04-22 23:30:00 1 89.25 60.00 2.55
2016-04-22 23:40:00 1 89.30 59.80 4.20
2016-04-22 23:50:00 1 88.85 59.65 3.90
2016-04-23 00:00:00 1 88.45 59.50 3.35
2016-04-23 00:10:00 1 87.90 59.40 3.40
Temp 1 Hum 1
2016-04-22 23:30:00 78.44 36.2
2016-04-22 23:40:00 79.88 36.4
2016-04-22 23:50:00 79.70 36.4
2016-04-23 00:00:00 79.88 36.3
2016-04-23 00:10:00 79.88 36.2
In [32]: merged_matrix.shape
Out[32]: (1430, 6)
In [33]: merged_matrix= merged_matrix.dropna(axis=0, how=’any’, thresh=None, \
subset=None, inplace=False)
#Drops Nan/Nat values (Sanity Check)
merged_matrix.head()
7
Out[33]: Solar Radiation Humidity Temperature Wind Speed \2016-04-22 23:30:00 1 89.25 60.00 2.55
2016-04-22 23:40:00 1 89.30 59.80 4.20
2016-04-22 23:50:00 1 88.85 59.65 3.90
2016-04-23 00:00:00 1 88.45 59.50 3.35
2016-04-23 00:10:00 1 87.90 59.40 3.40
Temp 1 Hum 1
2016-04-22 23:30:00 78.44 36.2
2016-04-22 23:40:00 79.88 36.4
2016-04-22 23:50:00 79.70 36.4
2016-04-23 00:00:00 79.88 36.3
2016-04-23 00:10:00 79.88 36.2
In [34]: print merged_matrix.shape
(1430, 6)
Next, in order to understand the time series behavior of each one of the predictor/input variables, weplot observed values of these variables with respect to time.
In [35]: plot_4=merged_matrix.plot(figsize=[20,8])
Artificial Neural Network
Now, we import libraries required to implement the Artificial Neural Network.
In [203]: import sys
import numpy.random
import random
In [204]: traintestData_raw = merged_matrix[[’Solar_Radiation’,’Humidity’,\
’Temperature’,’Wind_Speed’,’Temp_1’,’Hum_1’]]
traintestData_raw.head()
Out[204]: Solar Radiation Humidity Temperature Wind Speed \2016-04-22 23:30:00 1 89.25 60.00 2.55
2016-04-22 23:40:00 1 89.30 59.80 4.20
8
2016-04-22 23:50:00 1 88.85 59.65 3.90
2016-04-23 00:00:00 1 88.45 59.50 3.35
2016-04-23 00:10:00 1 87.90 59.40 3.40
Temp 1 Hum 1
2016-04-22 23:30:00 78.44 36.2
2016-04-22 23:40:00 79.88 36.4
2016-04-22 23:50:00 79.70 36.4
2016-04-23 00:00:00 79.88 36.3
2016-04-23 00:10:00 79.88 36.2
In [205]: len(traintestData_raw.index)
Out[205]: 1430
We initialize training data to be roughly 90% of the samples. Testing is carried out on the remaining10%.
In [206]: trainData=traintestData_raw.loc[’20160422’:’20160501’]
len(trainData.index)
Out[206]: 1080
The sigmoidal unit is the transformation function of each node within the ANN. That is, it is the rulethat governs the output value of a node given a input that is equal to the linear combination of nodal valuesfrom the previous layer. Sigmoidal units are not zero centred, and are prone to saturation for large inputvalues. However, input value magnification to the extent where the sigmoid unit would return a zero valueand kill off that particular branch of the network are unlikely in ANNs with one hidden layer. However,given a more convoluted neural network, other activation functions like the Tanh, RELU, and Leaky RELUmight be considered. We first define the sigmoid function for the forward-propogation section of the ANN,and the derivative of the sigmoid function for use in the gradient back-propogation section of the ANN.
In [207]: # sigmoid unit
def sigmoid(x):
return 1.0 / (1 + np.exp(-x))
# derivative of sigmoid unit
def derivSigmoid(x):
return x * (1 - x)
Over the output layer, we compute the L2 Norm between predicted output variables and actual outputvariables. In an ANN, we update weight vectors at each iteration with the objective of minimising this l2norm over the output layer. Below, we describe a function to calcualate the output error at the end of eachiteration of the ANN.
In [208]: def getSquareError(outputError):
result = 0
(rows, cols) = outputError.shape
for i in range(rows):
result += outputError[i, cols-1]**2
return result
Below, we define the procedure for forward propogation of inputs through the ANN, which then undergo aseries of nodal transformations and result in an output. In directional terms, the process of back-propogationis the exact opposite. Subsequent to forward propogation, we calculate the losses over the output layer, afterwhich we propogate the gradient calculated over the loss backward through the network. This process utilizesthe chain rule, and we end with the expression for the gradient of the loss function with respect to each ofthe inputs. Below, we lay out the logic for these two essential components of the ANN algorithm within thefunction ‘learnprocess’.
9
In [209]: def learnProcess(trainExamples, resultExamples,i):
exampleNum = len(trainExamples)
attrNum = len(trainExamples[0])
resultNum = len(resultExamples[0])
# weight array from input to hidden layer
w0To1 = (np.random.random((attrNum, NODENUM_HL))* RANDOM_RAGE - 0.5 * RANDOM_RAGE)
# weight array from hidden layer to output
w1To2 = (np.random.random((NODENUM_HL, resultNum))* RANDOM_RAGE - 0.5 * RANDOM_RAGE)
squareError = getSquareError(resultExamples)
layer0 = trainExamples
layer1 = sigmoid(np.dot(trainExamples, w0To1))
layer2 = sigmoid(np.dot(layer1, w1To2))
oError = resultExamples - layer2
curError = getSquareError(oError)
# print(curError)
n = 1
print("Inital square error:", squareError)
# back propagation process
while (curError < squareError and n < ITERATION_NUM):
squareError = curError
# case1 training rule for output unit weight
deltaL2 = LEARNING_RATE * oError * derivSigmoid(layer2)
#dL/dx Calculates the derivative of the loss function over
w1To2 += np.dot(layer1.T, deltaL2)
hError = np.dot(deltaL2, w1To2.T)
# case2 training rule for hidden unit weight
deltaL1 = hError * derivSigmoid(layer1)
w0To1 += LEARNING_RATE * np.dot(layer0.T, deltaL1)
layer0 = trainExamples
layer1 = sigmoid(np.dot(trainExamples, w0To1))
layer2 = sigmoid(np.dot(layer1, w1To2))
oError = resultExamples - layer2
curError = getSquareError(oError)
n += 1
print("Total iteration:", n)
print("Final square error:", curError)
node_err[i-1][0]=i
node_err[i-1][1]=curError
return (w0To1, w1To2)
We carry out ANN training individually for temperature and humidity prediction. The following sectionconsists of the training process for temperature prediction. We use a learning rate of 0.001 (Also called the
10
step size) for altering weights at each iteration with the objective of minimising the squared loss (l2 Norm)over the output layer. This iteration is carried out through a process called gradient descent. Further, wealso loop through hidden layer sizes (in terms of the number of nodes) to determine the hidden layer sizethat minimises training error.
Temperature Training
In [210]: # learning rate
LEARNING_RATE = 0.001 #Gradient descent step size
RANDOM_RAGE = 0.1 #Defines a range limitation for random initialization
# iteration number
ITERATION_NUM = 150000
node_err=(10,2)
node_err=np.ones(node_err)
for i in range(1,11):
# the number of nodes in hidden layer
NODENUM_HL = i
trainData = trainData.dropna(axis = 0, how = ’any’,\
thresh = None, subset = None, inplace = False)
train_Data = trainData.as_matrix(columns = [’Temperature’, ’Humidity’,\
’Wind_Speed’, ’Solar_Radiation’])
result_Data = trainData.as_matrix(columns = [’Temp_1’])
maxValue = max(result_Data)
minValue = min(result_Data)
result_Data = (result_Data - minValue) / (maxValue - minValue)
(w0To1, w1To2) = learnProcess(train_Data, result_Data,i)
node_err_temp=node_err
print node_err_temp
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 150000)
(’Final square error:’, 18.325206650737716)
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 150000)
(’Final square error:’, 18.104710918933801)
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 18278)
(’Final square error:’, 9.4791315777844964)
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 32415)
(’Final square error:’, 9.4246533736084572)
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 24781)
(’Final square error:’, 9.2209067593040288)
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 21527)
(’Final square error:’, 9.449877602213494)
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 14510)
(’Final square error:’, 9.9136854026428249)
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 29072)
(’Final square error:’, 9.3574724636432656)
(’Inital square error:’, 274.2273341836725)
11
(’Total iteration:’, 18788)
(’Final square error:’, 9.540072102029777)
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 36994)
(’Final square error:’, 9.1961200425312981)
[[ 1. 18.32520665]
[ 2. 18.10471092]
[ 3. 9.47913158]
[ 4. 9.42465337]
[ 5. 9.22090676]
[ 6. 9.4498776 ]
[ 7. 9.9136854 ]
[ 8. 9.35747246]
[ 9. 9.5400721 ]
[ 10. 9.19612004]]
In [211]: node_err_temp_pd = pd.DataFrame(node_err_temp\
,columns={’No_of_Nodes’,’Error’})
node_err_temp_pd
Out[211]: No of Nodes Error
0 1 18.325207
1 2 18.104711
2 3 9.479132
3 4 9.424653
4 5 9.220907
5 6 9.449878
6 7 9.913685
7 8 9.357472
8 9 9.540072
9 10 9.196120
Now, we visualize the error with respect to hidden layer size to determine the size that results in theleast error.
In [420]: plot_5=node_err_temp_pd.plot(x=’No_of_Nodes’,y=’Error’,figsize=[20,8])
plot_5.set_ylabel(’Error’, color=’b’)
Out[420]: <matplotlib.text.Text at 0x112d3fd10>
12
Therefore, we find that a 5 node hidden layer results in the minimum error for training.
Setting up Testing Data - Temperature
Recalibration of Weights After Considering Optimal Nodal Design We now recalibrate weightsfor the optimal hidden layer nodal design.
In [260]: # learning rate
LEARNING_RATE = 0.001 #Gradient descent step size
RANDOM_RAGE = 0.1 #Defines a range limitation for random initialization
# iteration number
ITERATION_NUM = 150000
node_err=(10,2)
node_err=np.ones(node_err)
# the number of nodes in hidden layer
NODENUM_HL = 5
trainData = trainData.dropna(axis = 0, how = ’any’,\
thresh = None, subset = None, inplace = False)
train_Data = trainData.as_matrix(columns = [’Temperature’, ’Humidity’,\
’Wind_Speed’, ’Solar_Radiation’])
result_Data = trainData.as_matrix(columns = [’Temp_1’])
maxValue = max(result_Data)
minValue = min(result_Data)
result_Data = (result_Data - minValue) / (maxValue - minValue)
(w0To1, w1To2) = learnProcess(train_Data, result_Data,NODENUM_HL)
print node_err
(’Inital square error:’, 274.2273341836725)
(’Total iteration:’, 12772)
(’Final square error:’, 10.157420969948955)
[[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 5. 10.15742097]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]]
Accessing roughly 10% of the data, which spans from 05/02/2016 to 05/04/2016, and assigning it to thetest data variable. Testing will be carried out on this variable.
In [261]: testData=traintestData_raw.loc[’20160502’:’20160504’]
testData.head()
Out[261]: Solar Radiation Humidity Temperature Wind Speed \2016-05-02 00:00:00 1 91.95 60.25 0
2016-05-02 00:20:00 1 92.60 60.15 0
2016-05-02 00:30:00 1 92.95 60.05 0
13
2016-05-02 00:40:00 1 93.25 59.90 0
2016-05-02 00:50:00 1 93.70 59.90 0
Temp 1 Hum 1
2016-05-02 00:00:00 79.70 37.0
2016-05-02 00:20:00 79.52 36.9
2016-05-02 00:30:00 81.86 36.7
2016-05-02 00:40:00 79.70 37.0
2016-05-02 00:50:00 79.52 36.1
We now propogate this test data forward through the ANN for the optimised weights, and find loss overthe normalized parameters.
In [262]: resultExamples_temptest=testData.as_matrix(columns=[’Temp_1’])
maxValue = max(resultExamples_temptest)
minValue = min(resultExamples_temptest)
resultExamples_temptest = (resultExamples_temptest - minValue) / (maxValue - minValue)
testData = testData.as_matrix(columns = \
[’Temperature’, ’Humidity’,\
’Wind_Speed’, ’Solar_Radiation’])
layer0 = testData
layer1 = sigmoid(np.dot(testData, w0To1))
layer2 = sigmoid(np.dot(layer1, w1To2))
oError = resultExamples_temptest - layer2
curError = getSquareError(oError)
print("Final square error:", curError)
(’Final square error:’, 9.6640104462032959)
We now map out the final paramter outputs predicted by the ANN, against the actual values as presentin the data.
In [263]: temp_ANN_pred=pd.DataFrame({’Predicted_Temp’:layer2[:,0],’Actual_Temp’:resultExamples_temptest[:,0]})
temp_ANN_pred.head()
Out[263]: Actual Temp Predicted Temp
0 0.40 0.404686
1 0.35 0.402527
2 1.00 0.401133
3 0.40 0.399618
4 0.35 0.398476
In [264]: testData=traintestData_raw.loc[’20160502’:’20160504’]
trainData=traintestData_raw.loc[’20160422’:’20160501’]
#Reinitializes test and train data to a dataframe
In [265]: temp_ANN_pred.index=testData.index
#Assigns timestamp index
We now visualize the actual versus predicted normalized values as output by the ANN. Surprisinly, thefit is a poor approximation of the actual paramter values. This could possibly be due to overfitting duringtesting, given that testing accuracy over a 5 hidden layer design was far superior to the accuracy observedhere.
14
In [424]: plot_ANNTemp=temp_ANN_pred.plot(figsize=[20,8])
plot_ANNTemp.set_ylabel(’Internal Temperature (Normalized)’, color=’b’)
plot_ANNTemp.set_xlabel(’Time Series’, color=’b’)
Out[424]: <matplotlib.text.Text at 0x118bb31d0>
Training for the humidity, we follow the same process as that used for temperature training.
Humidity Training
In [235]: # learning rate
LEARNING_RATE = 0.001 #Gradient descent step size
RANDOM_RAGE = 0.1
# the number of nodes in hidden layer
NODENUM_HL = 5
# iteration number
ITERATION_NUM = 150000
node_err=(10,2)
node_err=np.ones(node_err)
for i in range(1,11):
# the number of nodes in hidden layer
NODENUM_HL = i
trainData = trainData.dropna(axis = 0, how = ’any’,\
thresh = None, \
subset = None, inplace = False)
train_Data = trainData.as_matrix(columns = \
[’Temperature’, ’Humidity’,\
’Wind_Speed’, ’Solar_Radiation’])
result_Data = trainData.as_matrix(columns = [’Hum_1’])
maxValue = max(result_Data)
minValue = min(result_Data)
result_Data = (result_Data - minValue) / (maxValue - minValue)
(w0To1, w1To2) = learnProcess(train_Data, result_Data,i)
node_err_hum=node_err
print node_err_hum
15
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 15828)
(’Final square error:’, 27.167038933471073)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 34874)
(’Final square error:’, 25.487413711592971)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2306)
(’Final square error:’, 26.067411243902523)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2557)
(’Final square error:’, 21.480609235375173)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2406)
(’Final square error:’, 22.163900227937674)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2222)
(’Final square error:’, 21.47789644120682)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2798)
(’Final square error:’, 23.386632874286729)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2970)
(’Final square error:’, 23.648404669961021)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2426)
(’Final square error:’, 18.294206465652621)
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2463)
(’Final square error:’, 18.24197609913535)
[[ 1. 27.16703893]
[ 2. 25.48741371]
[ 3. 26.06741124]
[ 4. 21.48060924]
[ 5. 22.16390023]
[ 6. 21.47789644]
[ 7. 23.38663287]
[ 8. 23.64840467]
[ 9. 18.29420647]
[ 10. 18.2419761 ]]
In [236]: node_err_hum_pd = pd.DataFrame(node_err_hum,columns={’No. of Nodes’,’Error’})
node_err_hum_pd
Out[236]: No. of Nodes Error
0 1 27.167039
1 2 25.487414
2 3 26.067411
3 4 21.480609
4 5 22.163900
5 6 21.477896
6 7 23.386633
7 8 23.648405
8 9 18.294206
9 10 18.241976
16
In [418]: plot_6=node_err_hum_pd.plot(x=’No. of Nodes’,y=’Error’,figsize=[20,8])
plot_6.set_ylabel(’Error’, color=’b’)
Out[418]: <matplotlib.text.Text at 0x118b8b510>
Setting up Testing Data - Humidity
Recalibration of Weights After Considering Optimal Nodal Design We now recalibrate weightsfor the optimal hidden layer nodal design.
In [280]: # learning rate
LEARNING_RATE = 0.001 #Gradient descent step size
RANDOM_RAGE = 0.1 #Defines a range limitation for random initialization
# iteration number
ITERATION_NUM = 150000
node_err=(10,2)
node_err=np.ones(node_err)
# the number of nodes in hidden layer
NODENUM_HL = 10
trainData = trainData.dropna(axis = 0, how = ’any’,\
thresh = None, subset = None, inplace = False)
train_Data = trainData.as_matrix(columns = [’Temperature’, ’Humidity’,\
’Wind_Speed’, ’Solar_Radiation’])
result_Data = trainData.as_matrix(columns = [’Hum_1’])
maxValue = max(result_Data)
minValue = min(result_Data)
result_Data = (result_Data - minValue) / (maxValue - minValue)
(w0To1, w1To2) = learnProcess(train_Data, result_Data,NODENUM_HL)
print node_err
(’Inital square error:’, 269.30661186481473)
(’Total iteration:’, 2919)
(’Final square error:’, 17.786512335549215)
17
[[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 1. 1. ]
[ 10. 17.78651234]]
We now propogate test data forward through the ANN for the optimised weights, and find loss over thenormalized parameters.
In [281]: resultExamples_temphum=testData.as_matrix(columns=[’Hum_1’])
maxValue = max(resultExamples_temphum)
minValue = min(resultExamples_temphum)
resultExamples_temphum = (resultExamples_temphum - minValue) / (maxValue - minValue)
testData = testData.as_matrix(columns = \
[’Temperature’, ’Humidity’,\
’Wind_Speed’, ’Solar_Radiation’])
layer0 = testData
layer1 = sigmoid(np.dot(testData, w0To1))
layer2 = sigmoid(np.dot(layer1, w1To2))
oError = resultExamples_temphum - layer2
curError = getSquareError(oError)
print("Final square error:", curError)
(’Final square error:’, 29.219680611924691)
We now map out the final paramter outputs predicted by the ANN, against the actual values as presentin the data.
In [282]: hum_ANN_pred=pd.DataFrame({’Predicted_Hum’:layer2[:,0],’Actual_Hum’:resultExamples_temphum[:,0]})
hum_ANN_pred.head()
Out[282]: Actual Hum Predicted Hum
0 0.878505 0.573453
1 0.869159 0.575152
2 0.850467 0.575898
3 0.878505 0.576313
4 0.794393 0.577679
In [283]: testData=traintestData_raw.loc[’20160502’:’20160504’]
trainData=traintestData_raw.loc[’20160422’:’20160501’]
#Reinitializes test and train data to a dataframe
In [284]: hum_ANN_pred.index=testData.index
#Assigns timestamp index
Again, the ANN outputs a very innacurate prediction trend. This is again a possible side effect ofoverfitting, given that ten nodes were considered, and cross-validation was not carried out.
18
In [425]: plot_ANNhum=hum_ANN_pred.plot(figsize=[20,8])
plot_ANNhum.set_ylabel(’Internal Humidity (Normalized)’, color=’b’)
plot_ANNhum.set_xlabel(’Time Series’, color=’b’)
Out[425]: <matplotlib.text.Text at 0x119720c10>
Linear Regression (Causal Inference)
Having completed the ANN design and prediction, we now seek to implement a Linear Regression modelprimarily for causal inference, but also as a learning expriment in predictive modeling. First, we seek tounderstand the relationship between predictor variables (Inputs), and response variables. (Output) In orderto do this, we first visualize the relationship of the input variables with each one of internal temperature andhumidity.
In [47]: plot_7=trainData.plot(kind=’scatter’,x=’Temp_1’,y=’Solar_Radiation’,figsize=[20,8])
In [48]: plot_8=trainData.plot(kind=’scatter’,x=’Temp_1’,y=’Temperature’,figsize=[20,8])
19
In [49]: plot_9=trainData.plot(kind=’scatter’,x=’Temp_1’,y=’Humidity’,figsize=[20,8])
In [50]: plot_10=trainData.plot(kind=’scatter’,x=’Temp_1’,y=’Wind_Speed’,figsize=[20,8])
20
In [51]: plot_11=trainData.plot(kind=’scatter’,x=’Hum_1’,y=’Solar_Radiation’,figsize=[20,8])
In [52]: plot_12=trainData.plot(kind=’scatter’,x=’Hum_1’,y=’Temperature’,figsize=[20,8])
In [53]: plot_13=trainData.plot(kind=’scatter’,x=’Hum_1’,y=’Humidity’,figsize=[20,8])
21
In [54]: plot_14=trainData.plot(kind=’scatter’,x=’Hum_1’,y=’Wind_Speed’,figsize=[20,8])
The relationships between the predictor variables and Hum 1/Temp 1, therefore, seem to be based on ahigher order multinomial relationship. We will conduct tests on polynomial regression models of differentorders to determine the one that gives us the best fit.
Linear Regression
The objective here, is to estimate Temperature and Humidity given predictor variables values. The equationis as follows:
Internal Temperature=a0+a1(Solar Radiation)+a2(Temperature)+a3(Humidity)+a4(Wind Speed)Internal Humidity=a0+a1(Solar Radiation)+a2(Temperature)+a3(Humidity)+a4(Wind Speed)
In [304]: testData=traintestData_raw.loc[’20160502’:’20160504’]
trainData=traintestData_raw.loc[’20160422’:’20160501’]
In [305]: trainData_pred=trainData[[’Solar_Radiation’,’Temperature’,’Humidity’,’Wind_Speed’]]
trainData_pred.head()
22
Out[305]: Solar Radiation Temperature Humidity Wind Speed
2016-04-22 23:30:00 1 60.00 89.25 2.55
2016-04-22 23:40:00 1 59.80 89.30 4.20
2016-04-22 23:50:00 1 59.65 88.85 3.90
2016-04-23 00:00:00 1 59.50 88.45 3.35
2016-04-23 00:10:00 1 59.40 87.90 3.40
In [306]: trainData_pred.insert(0, ’Ones’, 1, allow_duplicates=True)
#Adding a constant term that would correspond to a0
trainData_pred.head()
Out[306]: Ones Solar Radiation Temperature Humidity Wind Speed
2016-04-22 23:30:00 1 1 60.00 89.25 2.55
2016-04-22 23:40:00 1 1 59.80 89.30 4.20
2016-04-22 23:50:00 1 1 59.65 88.85 3.90
2016-04-23 00:00:00 1 1 59.50 88.45 3.35
2016-04-23 00:10:00 1 1 59.40 87.90 3.40
We now convert the training data into a numpy matrix to perform subsequent matrix operations.
In [307]: trainData_pred=trainData_pred.as_matrix(columns=None)
#Converting to a Numpy Matrix
trainData_pred
Out[307]: array([[ 1. , 1. , 60. , 89.25, 2.55],
[ 1. , 1. , 59.8 , 89.3 , 4.2 ],
[ 1. , 1. , 59.65, 88.85, 3.9 ],
...,
[ 1. , 1. , 60.15, 92.4 , 0. ],
[ 1. , 1. , 60.55, 91.65, 0.55],
[ 1. , 1. , 60.45, 91.45, 0. ]])
Temperature Tests We now do the same with training data output for the temperature, and convertthis vector of values into a numpy array.
In [308]: trainData_out_temp=trainData[[’Temp_1’]]
#Accesses ’Temp_1’, which gives us the internal temperature in Hunt Library
trainData_out_temp.head()
Out[308]: Temp 1
2016-04-22 23:30:00 78.44
2016-04-22 23:40:00 79.88
2016-04-22 23:50:00 79.70
2016-04-23 00:00:00 79.88
2016-04-23 00:10:00 79.88
In [309]: trainData_out_temp=trainData_out_temp.as_matrix(columns=None)
#Converts to Matrix
trainData_out_temp[:10]
Out[309]: array([[ 78.44],
[ 79.88],
[ 79.7 ],
[ 79.88],
[ 79.88],
[ 79.7 ],
23
[ 79.7 ],
[ 79.7 ],
[ 79.88],
[ 79.7 ]])
First Order Multivariate Regression We now seek to perform first order multivariate linear regres-sion. Multivariate regression offers a convenient closed form solution (Stated below) that would allow us todetermine the coefficients corresponding to each input variable.
Solution: A=[(X’X)ˆ-1]X’YWhere A is the vector of coefficients corresponding to each predictor variable. X is the matrix of predictor
variables, and Y is the corresponding vector of outputs observed in each training example.
LstSq Function We use the np.linalg.lstsq(input,output) function to determine the value of the coef-ficients. This function takes the predictor and response variables as inputs, and outputs an array of valuesconsisting of the preditor variable coefficients (A), l2 loss (Which is equal to the sum of the squared differencebetween predicted Y, and actual Y), Rank of the Matrix, and Singular Values of the predictor variables.
In [310]: a_funct_temp_o1 = np.linalg.lstsq(trainData_pred, trainData_out_temp)
In [311]: a_funct_temp_o1
# Returns coeff a values in index 0, l2 Norm Loss in index 1,
#rank of matrix in index 2, and 3 is singular values of trainData_pred
Out[311]: (array([[ 7.78318038e+01],
[ 5.28436528e-04],
[ 3.74469693e-02],
[ -8.06533948e-03],
[ 6.81049727e-02]]),
array([ 237.71098444]),
5,
array([ 1.02197701e+04, 2.65591549e+03, 4.77501272e+02,
4.75429041e+01, 2.99992583e+00]))
Next, we visualize the predicted values by calculating the dot product between the coefficients andpredictor variables.
In [312]: testData=traintestData_raw.loc[’20160502’:’20160504’]
In [313]: testData_pred=testData[[’Solar_Radiation’,’Temperature’,’Humidity’,’Wind_Speed’]]
testData_pred.insert(0, ’Ones’, 1, allow_duplicates=True)
testData_pred.head()
Out[313]: Ones Solar Radiation Temperature Humidity Wind Speed
2016-05-02 00:00:00 1 1 60.25 91.95 0
2016-05-02 00:20:00 1 1 60.15 92.60 0
2016-05-02 00:30:00 1 1 60.05 92.95 0
2016-05-02 00:40:00 1 1 59.90 93.25 0
2016-05-02 00:50:00 1 1 59.90 93.70 0
In [314]: temp_predvect_o1=np.dot(testData_pred,a_funct_temp_o1[0])
#Accessing coefficient vector from index, and calculating dot product with trainingdata
temp_predvect_o1
Out[314]: array([[ 79.34690418],
[ 79.33791701],
[ 79.33134944],
24
[ 79.3233128 ],
[ 79.31968339],
[ 79.31539144],
[ 79.31150276],
[ 79.31029296],
[ 79.30507918],
[ 79.29718652],
[ 79.28914987],
[ 79.32425182],
[ 79.29886583],
[ 79.29203899],
[ 79.28253326],
[ 79.27703151],
[ 79.29694687],
[ 79.25171177],
[ 79.22445343],
[ 79.22056474],
[ 79.21975821],
[ 79.21105902],
[ 79.21319064],
[ 79.20863941],
[ 79.21077104],
[ 79.61187879],
[ 79.26911186],
[ 79.30771958],
[ 79.18819643],
[ 79.22736217],
[ 79.20318598],
[ 79.20131363],
[ 79.24404897],
[ 79.16572825],
[ 79.14529676],
[ 79.1601112 ],
[ 79.18281831],
[ 79.13993499],
[ 79.14898687],
[ 79.15261534],
[ 79.16797457],
[ 79.25506473],
[ 79.27305682],
[ 79.3107812 ],
[ 79.39463036],
[ 79.37939298],
[ 79.3910607 ],
[ 79.39539829],
[ 79.46881356],
[ 79.50703061],
[ 79.51937309],
[ 79.70004373],
[ 79.76789669],
[ 79.69449618],
[ 80.11824887],
[ 79.8903142 ],
[ 80.08596081],
25
[ 80.15221872],
[ 80.10932833],
[ 80.05593112],
[ 79.93274299],
[ 80.14178259],
[ 79.96191569],
[ 80.20196463],
[ 80.06097989],
[ 80.48410712],
[ 80.4318415 ],
[ 80.21346672],
[ 80.33765434],
[ 80.4883458 ],
[ 80.34693215],
[ 80.20871979],
[ 80.12441679],
[ 80.27693977],
[ 80.70493373],
[ 80.54203435],
[ 80.22188307],
[ 80.28132491],
[ 80.29794114],
[ 80.17650734],
[ 80.25415 ],
[ 80.09821907],
[ 80.21927044],
[ 80.36092161],
[ 80.50257278],
[ 80.66340819],
[ 80.44121225],
[ 80.25630045],
[ 80.18556066],
[ 80.21896857],
[ 80.25155886],
[ 80.11445035],
[ 80.06440789],
[ 80.12072441],
[ 80.08877693],
[ 80.09046756],
[ 80.06551353],
[ 80.21321918],
[ 80.34039133],
[ 80.28906294],
[ 80.16840498],
[ 80.12394717],
[ 80.02218974],
[ 80.0482404 ],
[ 79.9931941 ],
[ 79.85041514],
[ 79.90190421],
[ 79.80089797],
[ 79.77046041],
[ 79.78984116],
[ 79.71039852],
26
[ 79.71630537],
[ 79.66299079],
[ 79.5713173 ],
[ 79.5760427 ],
[ 79.577866 ],
[ 79.4624932 ],
[ 79.40652125],
[ 79.38820589],
[ 79.38064433],
[ 79.35456467],
[ 79.37634995],
[ 79.42723322],
[ 79.37718761],
[ 79.40807778],
[ 79.34086193],
[ 79.40065463],
[ 79.42653883],
[ 79.39766742],
[ 79.41288513],
[ 79.37626505],
[ 79.35033181],
[ 79.34042281],
[ 79.31088886],
[ 79.27026881],
[ 79.35812638],
[ 79.38996707],
[ 79.32084375],
[ 79.25344879],
[ 79.21852595],
[ 79.25558042],
[ 79.21210237],
[ 79.20876094],
[ 79.18337496],
[ 79.22196232],
[ 79.24504 ],
[ 79.212747 ],
[ 79.18820267],
[ 79.2004184 ],
[ 79.19399482],
[ 79.22462571],
[ 79.19468849],
[ 79.19276709],
[ 79.16136079],
[ 79.18937661],
[ 79.21934496],
[ 79.15408163],
[ 79.15267637],
[ 79.15337003],
[ 79.05747582],
[ 79.17259173],
[ 79.07298151],
[ 79.13960105],
[ 79.0986579 ],
[ 79.05811801],
27
[ 79.03126294],
[ 78.99930939],
[ 79.0370264 ],
[ 79.02642374],
[ 79.09799778],
[ 79.05210013],
[ 79.03005557],
[ 79.00700901],
[ 79.01833804],
[ 79.04020592],
[ 79.03136274],
[ 78.96843884],
[ 78.96495343],
[ 79.00347698],
[ 79.00534933],
[ 79.00868181],
[ 79.08307315],
[ 78.99484126],
[ 78.97027568],
[ 78.95181395],
[ 78.99070283],
[ 78.98758539],
[ 79.0016334 ],
[ 79.03723297],
[ 79.13325895],
[ 79.08013459],
[ 79.05781632],
[ 79.06962355],
[ 79.13400259],
[ 79.18527959],
[ 79.20910531],
[ 79.26562346],
[ 79.28680268],
[ 79.26742675],
[ 79.23023604],
[ 79.25417177],
[ 79.35919177],
[ 79.24682934],
[ 79.30971346],
[ 79.32407746],
[ 79.34994586],
[ 79.40644042],
[ 79.46464989],
[ 79.58652783],
[ 79.58139454],
[ 79.81695539],
[ 79.83357168],
[ 79.82909385],
[ 79.82456317],
[ 79.82008534],
[ 79.81601078],
[ 79.81150652],
[ 79.80887462],
[ 79.81007765],
28
[ 79.80900507],
[ 79.80447439],
[ 79.79999656],
[ 79.795922 ],
[ 79.79144417],
[ 79.78691349],
[ 79.78283893],
[ 79.7783611 ],
[ 79.77385684],
[ 79.76935259],
[ 79.76527803],
[ 79.75667279],
[ 79.7556002 ],
[ 79.75452762],
[ 79.75002337],
[ 79.74592238],
[ 79.74144455],
[ 79.73696672],
[ 79.73658401],
[ 79.73210618],
[ 79.72762835],
[ 79.72312409],
[ 79.71902311],
[ 79.71454528],
[ 79.71006744],
[ 79.70594004],
[ 79.7014622 ],
[ 79.70379487],
[ 79.69931704],
[ 79.69518963],
[ 79.6907118 ],
[ 79.68623397],
[ 79.68213298],
[ 79.67762873],
[ 79.6731509 ],
[ 79.66867307],
[ 79.66006783],
[ 79.65933469],
[ 79.65523371],
[ 79.65072945],
[ 79.64965687],
[ 79.64858429],
[ 79.64445688],
[ 79.63997905],
[ 79.63590448],
[ 79.63140023],
[ 79.62689598],
[ 79.62241815],
[ 79.61834358],
[ 79.61381291],
[ 79.60933508],
[ 79.60526051],
[ 79.60078268],
[ 79.59625201],
29
[ 79.59517942],
[ 79.59451011],
[ 79.59000585],
[ 79.58737395],
[ 79.58517173],
[ 79.5806939 ],
[ 79.57616323],
[ 79.5716854 ],
[ 79.58699319],
[ 79.50674865],
[ 79.49937455],
[ 79.47448676],
[ 79.44722843],
[ 79.47880983],
[ 79.44853317],
[ 79.44130305],
[ 79.420353 ],
[ 79.41606105],
[ 79.42831191],
[ 79.46764442],
[ 79.44653889],
[ 79.33122995],
[ 79.35675992],
[ 79.35874757],
[ 79.33857536],
[ 79.36725373],
[ 79.28494032],
[ 79.21546258],
[ 79.18250946],
[ 79.15445345],
[ 79.10102032],
[ 79.10822267],
[ 79.10698511],
[ 79.11782581],
[ 79.12048281],
[ 79.11271224],
[ 79.1161699 ],
[ 79.12903681],
[ 79.1436166 ],
[ 79.16086459],
[ 79.19275149],
[ 79.21052325],
[ 79.24954462],
[ 79.27028716],
[ 79.30779705],
[ 79.31710105],
[ 79.40340592],
[ 79.40650629],
[ 79.3654009 ],
[ 79.41720828],
[ 79.48020233],
[ 79.60245016],
[ 79.61562306],
[ 79.60301428],
30
[ 79.75558508],
[ 79.96536001],
[ 79.94097236],
[ 79.88846493],
[ 79.82935887],
[ 79.73985661],
[ 79.71808684],
[ 79.7758631 ],
[ 79.80494026],
[ 79.90564227],
[ 79.93839243],
[ 80.1464917 ],
[ 80.20490626],
[ 80.30048201],
[ 80.37916636],
[ 79.9793115 ],
[ 80.33445582],
[ 80.25016875],
[ 80.28719736],
[ 80.08509193],
[ 80.02731582],
[ 80.11145704],
[ 80.16450468]])
In [315]: testData_out_temp=testData[[’Temp_1’]]
testData_out_temp=testData_out_temp.as_matrix(columns=None)
We then store the actual and predicted values in a pandas dataframe, and assign the timestamp indexof the traindata to this dataframe.
In [316]: o1_predcomp_temp=pd.DataFrame({’Predicted_Temp’:temp_predvect_o1[:,0],’Actual_Temp’:testData_out_temp[:,0]}) #trainData_out_temp
o1_predcomp_temp.head()
Out[316]: Actual Temp Predicted Temp
0 79.70 79.346904
1 79.52 79.337917
2 81.86 79.331349
3 79.70 79.323313
4 79.52 79.319683
In [317]: o1_predcomp_temp.index=testData.index
o1_predcomp_temp.head()
Out[317]: Actual Temp Predicted Temp
2016-05-02 00:00:00 79.70 79.346904
2016-05-02 00:20:00 79.52 79.337917
2016-05-02 00:30:00 81.86 79.331349
2016-05-02 00:40:00 79.70 79.323313
2016-05-02 00:50:00 79.52 79.319683
We then use the linalg.norm function to find the norm between the two columns.
In [318]: numpy.linalg.norm(o1_predcomp_temp.Actual_Temp-o1_predcomp_temp.Predicted_Temp)
Out[318]: 8.3417961142492878
31
Now, we plot the two variables in the dataframe to visualize and compare the actual vs predicted valuesfor temperature in that time span. Clearly, there are significant issues with the accuracy of prediction. Thiscould be due to either one of two factors:
Error Source 1:The model does not allow for sufficient flexibility, in terms of allowing for higher order relationships
between predictor and response variables. A component of error is likely due to this, given the observednon-linear relationship between individual predictors and response variables that were visualized earlier.
Error Source 2:It is possible that the subset of predictors chosen is not an exhaustive set of factors influencing internal
temperature. We know for a fact that internal HVAC data from Hunt Library is likely an important factorin determining the final internal temperature. This data has been difficult to obtain, and therefore, is notfactored into the model.
In [426]: plot_15=o1_predcomp_temp.plot(figsize=[20,8])
plot_15.set_ylabel(’Internal Temperature’, color=’b’)
plot_15.set_xlabel(’Time Series’, color=’b’)
Out[426]: <matplotlib.text.Text at 0x1197a9ad0>
Matrix Multiplication (To Understand Underlying Mechanism) We now perform a matrixmultiplication to verify the output of the linalg.lstsq() function used earlier. This computation uses methodsthat allow us to inverse (linalg.inv), transpose (np.transpose()), and calculate dot products (np.dot) toevaluate the closed form solution for coefficients specified at the start of this section. We observe that thecoefficients calculated using this method are identical to the coefficients calculated by the linalg.lstsq method.
In [320]: a_mult_temp=np.dot(np.dot(linalg.inv(np.dot(trainData_pred.transpose(),\
trainData_pred)),trainData_pred.transpose())\
,trainData_out_temp)
#Computes coefficient vector
In [321]: a_mult_temp
Out[321]: array([[ 7.78318038e+01],
[ 5.28436528e-04],
[ 3.74469693e-02],
[ -8.06533948e-03],
[ 6.81049727e-02]])
32
Humidity Tests
In [322]: testData=traintestData_raw.loc[’20160502’:’20160504’]
trainData=traintestData_raw.loc[’20160422’:’20160501’]
We now carry out identical tests for humidity, and observe a similar fit, with considerable loss over thepredictions. This strengthens our belief in the need for a higher order representation, in addition to the needto get more expressive set of predictor variables, especially for temperature.
In [323]: trainData_out_hum=trainData[[’Hum_1’]]
trainData_out_hum.head()
Out[323]: Hum 1
2016-04-22 23:30:00 36.2
2016-04-22 23:40:00 36.4
2016-04-22 23:50:00 36.4
2016-04-23 00:00:00 36.3
2016-04-23 00:10:00 36.2
In [324]: trainData_out_hum=trainData_out_hum.as_matrix(columns=None)
trainData_out_hum[:10]
Out[324]: array([[ 36.2],
[ 36.4],
[ 36.4],
[ 36.3],
[ 36.2],
[ 35.8],
[ 36. ],
[ 35.5],
[ 34.4],
[ 34.1]])
First Order Multivariate Regression
LstSq Function
In [328]: a_funct_hum_o1 = np.linalg.lstsq(trainData_pred, trainData_out_hum)
In [329]: a_funct_hum_o1
Out[329]: (array([[ -1.21793925e+01],
[ -2.72344512e-03],
[ 4.22477246e-01],
[ 2.46070242e-01],
[ 2.18038405e-01]]),
array([ 2445.52601234]),
5,
array([ 1.02197701e+04, 2.65591549e+03, 4.77501272e+02,
4.75429041e+01, 2.99992583e+00]))
In [330]: hum_predvect_o1=np.dot(testData_pred,a_funct_hum_o1[0])
#Prediction basing on estimated params
hum_predvect_o1
33
Out[330]: array([[ 35.8982968 ],
[ 36.01599474],
[ 36.0598716 ],
[ 36.07032108],
[ 36.18105269],
[ 36.2337499 ],
[ 36.2741436 ],
[ 36.31105413],
[ 36.23537904],
[ 36.1631871 ],
[ 36.17363658],
[ 36.32372946],
[ 36.25298075],
[ 36.2265197 ],
[ 36.20354181],
[ 36.29314956],
[ 36.40054228],
[ 36.36140855],
[ 36.26953598],
[ 36.30992968],
[ 36.3345367 ],
[ 36.28695179],
[ 36.37841356],
[ 36.36077286],
[ 36.45223464],
[ 37.6666552 ],
[ 36.6973361 ],
[ 37.06893118],
[ 36.65601817],
[ 36.63011219],
[ 36.52245295],
[ 36.50132908],
[ 36.65327599],
[ 36.40253182],
[ 36.3371203 ],
[ 36.33916024],
[ 36.41339211],
[ 36.22120922],
[ 36.26687841],
[ 36.35072384],
[ 36.47675531],
[ 36.69623399],
[ 36.59827206],
[ 36.68081074],
[ 36.68982335],
[ 36.76854953],
[ 37.29483945],
[ 37.30345408],
[ 37.56283268],
[ 37.43793681],
[ 37.35426826],
[ 36.97117162],
[ 36.42595891],
[ 37.17682566],
34
[ 35.9222613 ],
[ 36.5975405 ],
[ 35.67540604],
[ 35.58742006],
[ 35.44313637],
[ 35.68849076],
[ 35.81800849],
[ 34.64394173],
[ 35.68790225],
[ 36.37173261],
[ 35.52839523],
[ 35.09203774],
[ 35.021855 ],
[ 35.80800337],
[ 35.90272462],
[ 34.33205902],
[ 34.99478911],
[ 34.56428782],
[ 34.70258417],
[ 34.75613097],
[ 33.23511784],
[ 33.57388606],
[ 34.11157751],
[ 34.25398761],
[ 33.78010773],
[ 33.77699189],
[ 33.52601006],
[ 33.92613641],
[ 33.29603064],
[ 33.18268712],
[ 33.06934361],
[ 32.00373588],
[ 32.89558512],
[ 32.84889912],
[ 32.85205531],
[ 33.46853075],
[ 33.12936645],
[ 33.06008697],
[ 33.03209316],
[ 33.27621492],
[ 32.99408562],
[ 32.4529855 ],
[ 32.25637029],
[ 32.24287682],
[ 32.06436855],
[ 31.73694132],
[ 32.32331589],
[ 32.70147578],
[ 32.23721654],
[ 32.17112234],
[ 31.99040091],
[ 31.86701772],
[ 32.09240158],
[ 31.95585968],
35
[ 31.80509085],
[ 31.80394936],
[ 31.76007962],
[ 32.01134327],
[ 32.04452487],
[ 31.74963094],
[ 31.82624352],
[ 31.83339576],
[ 31.79056193],
[ 32.11193899],
[ 32.37576866],
[ 32.45396237],
[ 32.378143 ],
[ 32.32400256],
[ 32.37354414],
[ 32.2464318 ],
[ 32.41022978],
[ 32.31124927],
[ 32.46474204],
[ 32.42657418],
[ 32.38391566],
[ 32.37365208],
[ 32.1761602 ],
[ 32.20035642],
[ 32.18968205],
[ 32.08898912],
[ 32.02384678],
[ 32.2446051 ],
[ 32.30115463],
[ 31.98907882],
[ 31.78076828],
[ 31.74000542],
[ 31.87223006],
[ 31.70124086],
[ 31.64668962],
[ 31.57594091],
[ 31.69794361],
[ 31.75055754],
[ 31.69233989],
[ 31.7889556 ],
[ 31.75548659],
[ 31.71672202],
[ 31.8101821 ],
[ 31.72179269],
[ 31.68669719],
[ 31.56488041],
[ 31.61817433],
[ 31.6536 ],
[ 31.53675124],
[ 31.45967455],
[ 31.46474521],
[ 31.25443638],
[ 31.56706727],
[ 31.07888996],
36
[ 31.25884378],
[ 31.16701284],
[ 31.06287839],
[ 30.95870231],
[ 30.93183046],
[ 31.12208949],
[ 31.10634445],
[ 31.29163546],
[ 31.19446734],
[ 31.17826986],
[ 31.07269219],
[ 31.10282231],
[ 31.19234678],
[ 31.22740329],
[ 31.07572557],
[ 31.10381575],
[ 31.26946776],
[ 31.29059162],
[ 31.29396508],
[ 31.45209147],
[ 31.20229269],
[ 31.0864151 ],
[ 31.09275122],
[ 31.05300836],
[ 30.94988206],
[ 30.93932287],
[ 30.93233933],
[ 31.04396869],
[ 30.88530282],
[ 30.95438699],
[ 30.6273868 ],
[ 30.6497562 ],
[ 30.58765663],
[ 30.65019166],
[ 30.68220282],
[ 30.25538343],
[ 30.4411413 ],
[ 30.44303613],
[ 30.38813598],
[ 30.34602687],
[ 30.04988068],
[ 29.99602766],
[ 29.63268318],
[ 29.1394627 ],
[ 29.26489642],
[ 28.82974692],
[ 28.66439441],
[ 29.15105908],
[ 28.44367351],
[ 28.01067542],
[ 28.01330285],
[ 28.01620262],
[ 28.01883005],
[ 28.00915396],
37
[ 28.01191756],
[ 28.03580502],
[ 28.05815472],
[ 28.07168407],
[ 28.07458384],
[ 28.07721127],
[ 28.06753518],
[ 28.07016261],
[ 28.07306238],
[ 28.06338629],
[ 28.06601372],
[ 28.06877732],
[ 28.07154092],
[ 28.06186484],
[ 28.05508852],
[ 28.06861787],
[ 28.08214722],
[ 28.08491082],
[ 28.0753709 ],
[ 28.07799833],
[ 28.08062576],
[ 28.11346974],
[ 28.11609717],
[ 28.11872459],
[ 28.12148819],
[ 28.11194828],
[ 28.11457571],
[ 28.11720314],
[ 28.1077994 ],
[ 28.11042682],
[ 28.13485809],
[ 28.13748552],
[ 28.12808178],
[ 28.1307092 ],
[ 28.13333663],
[ 28.12379672],
[ 28.12656032],
[ 28.12918774],
[ 28.13181517],
[ 28.12503886],
[ 28.16991401],
[ 28.1603741 ],
[ 28.1631377 ],
[ 28.17666704],
[ 28.19019639],
[ 28.18079265],
[ 28.18342008],
[ 28.17374399],
[ 28.17650759],
[ 28.17927119],
[ 28.18189862],
[ 28.17222253],
[ 28.1751223 ],
[ 28.17774973],
38
[ 28.16807365],
[ 28.17070107],
[ 28.17360084],
[ 28.18713019],
[ 28.18835603],
[ 28.19111963],
[ 28.21500709],
[ 28.22645487],
[ 28.22908229],
[ 28.23198206],
[ 28.23460949],
[ 27.64036106],
[ 27.17186438],
[ 27.24034826],
[ 27.06068299],
[ 26.96881041],
[ 26.95502204],
[ 26.83528683],
[ 26.82112929],
[ 26.61504194],
[ 26.66773915],
[ 26.82624158],
[ 27.14994359],
[ 27.3871545 ],
[ 27.1431934 ],
[ 27.13130069],
[ 27.30540388],
[ 27.31033027],
[ 27.42165859],
[ 27.85123502],
[ 28.71899741],
[ 28.78540238],
[ 29.32838604],
[ 30.41087022],
[ 30.46607147],
[ 30.54402572],
[ 30.53967314],
[ 30.31095701],
[ 29.93089069],
[ 29.94384421],
[ 29.98017484],
[ 29.98742337],
[ 29.827249 ],
[ 29.80704368],
[ 29.66452224],
[ 29.56205929],
[ 29.34638515],
[ 29.39613887],
[ 29.29015004],
[ 29.40599354],
[ 29.12776775],
[ 29.05389954],
[ 29.23895987],
[ 28.69967206],
39
[ 28.3167044 ],
[ 28.35016483],
[ 27.88774343],
[ 27.07317147],
[ 26.5546384 ],
[ 26.65013087],
[ 27.0527206 ],
[ 26.64722419],
[ 26.91818109],
[ 27.19733539],
[ 26.92377284],
[ 26.61832226],
[ 26.72259165],
[ 26.34426268],
[ 25.73537697],
[ 26.12418297],
[ 26.0060648 ],
[ 26.36494983],
[ 26.84787143],
[ 26.16513687],
[ 26.21230991],
[ 26.08454625],
[ 26.51306297],
[ 26.49381391],
[ 26.38958072],
[ 26.57566049]])
In [331]: testData_out_hum=testData[[’Hum_1’]]
testData_out_hum=testData_out_hum.as_matrix(columns=None)
In [332]: o1_predcomp_hum=pd.DataFrame({’Predicted_Hum’:hum_predvect_o1[:,0],\
’Actual_Hum’:testData_out_hum[:,0]})
#trainData_out_temp
o1_predcomp_hum.head()
Out[332]: Actual Hum Predicted Hum
0 37.0 35.898297
1 36.9 36.015995
2 36.7 36.059872
3 37.0 36.070321
4 36.1 36.181053
In [334]: o1_predcomp_hum.index=testData.index
o1_predcomp_hum.head()
Out[334]: Actual Hum Predicted Hum
2016-05-02 00:00:00 37.0 35.898297
2016-05-02 00:20:00 36.9 36.015995
2016-05-02 00:30:00 36.7 36.059872
2016-05-02 00:40:00 37.0 36.070321
2016-05-02 00:50:00 36.1 36.181053
In [335]: numpy.linalg.norm(o1_predcomp_hum.Actual_Hum-o1_predcomp_hum.Predicted_Hum)
Out[335]: 37.057462760024826
40
The following plot visualizes the actual humidity versus the prediction.
In [427]: plot_16=o1_predcomp_hum.plot(figsize=[20,8])
plot_16.set_ylabel(’Internal Humidity’, color=’b’)
plot_16.set_xlabel(’Time Series’, color=’b’)
Out[427]: <matplotlib.text.Text at 0x113016e50>
Second Order Multivariate Regression
We now install python’s machine learning package sklearn, and use the preprocessing library to import thePolynomialFeatures, and linear model methods that would allow us to explode first order predictor variablesto higher orders.
In [342]: from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
In [343]: testData=traintestData_raw.loc[’20160502’:’20160504’]
trainData=traintestData_raw.loc[’20160422’:’20160501’]
In [344]: poly = PolynomialFeatures(degree=2)
#sets order of regression
In [345]: trainData_pred[0]
#First Training Example: 1, SR, Temp, Hum, Wind Speed
Out[345]: array([ 1. , 1. , 60. , 89.25, 2.55])
In [350]: trainData_pred_o2=poly.fit_transform(trainData_pred)
testData_pred_o2=poly.fit_transform(testData_pred)
#Transforming traindata to specified order
In [347]: trainData_pred_o2[0]
#First Training example of transformed sample in the second order (Training)
41
Out[347]: array([ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00,
6.00000000e+01, 8.92500000e+01, 2.55000000e+00,
1.00000000e+00, 1.00000000e+00, 6.00000000e+01,
8.92500000e+01, 2.55000000e+00, 1.00000000e+00,
6.00000000e+01, 8.92500000e+01, 2.55000000e+00,
3.60000000e+03, 5.35500000e+03, 1.53000000e+02,
7.96556250e+03, 2.27587500e+02, 6.50250000e+00])
In [351]: testData_pred_o2[0]
#First Training example of transformed sample in the second order (Testing)
Out[351]: array([ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00,
6.02500000e+01, 9.19500000e+01, 0.00000000e+00,
1.00000000e+00, 1.00000000e+00, 6.02500000e+01,
9.19500000e+01, 0.00000000e+00, 1.00000000e+00,
6.02500000e+01, 9.19500000e+01, 0.00000000e+00,
3.63006250e+03, 5.53998750e+03, 0.00000000e+00,
8.45480250e+03, 0.00000000e+00, 0.00000000e+00])
The exploded space, therefore, consists of 21 predictor variables, thus allowing for greater model flexibility.We would, therefore, expect a better fit with higher order regression than the results from the first ordermultivariate regression.
Temperature We now follow a method similar to the one employed for first order regression to carryout second order regression.
In [348]: a_funct_temp_o2 = np.linalg.lstsq(trainData_pred_o2, trainData_out_temp)
#Calculating coefficient vector, loss, rank, and blah
In [349]: a_funct_temp_o2
Out[349]: (array([[ 2.32951991e+01],
[ 2.32951991e+01],
[ 1.56588442e-03],
[ 7.47454034e-02],
[ 6.48933406e-02],
[ 1.18408427e-01],
[ 2.32951991e+01],
[ 1.56588416e-03],
[ 7.47454033e-02],
[ 6.48933404e-02],
[ 1.18408427e-01],
[ -9.68622210e-07],
[ -1.85052626e-05],
[ -2.54900448e-05],
[ 3.03122179e-04],
[ -2.40441190e-04],
[ -1.18427276e-03],
[ -2.83913106e-03],
[ -4.88623051e-04],
[ -2.26885555e-04],
[ -1.45039047e-02]]),
array([], dtype=float64),
15,
array([ 7.45986678e+06, 3.55188748e+05, 1.95361515e+05,
42
1.20910776e+05, 4.58000766e+04, 1.77848563e+04,
6.44400964e+03, 3.52987078e+03, 1.11685798e+03,
6.59266099e+02, 3.16040890e+02, 8.76511721e+01,
4.58666434e+01, 5.27359788e+00, 5.69081560e-01,
6.44149164e-11, 3.62070457e-11, 2.46047174e-12,
3.79561191e-13, 7.49638951e-14, 3.40372875e-14]))
In [352]: temp_predvect_o2=np.dot(testData_pred_o2,a_funct_temp_o2[0])
#Prediction basing on estimated params
temp_predvect_o2
Out[352]: array([[ 79.26109153],
[ 79.23935727],
[ 79.22706007],
[ 79.21582958],
[ 79.20119341],
[ 79.19085813],
[ 79.18212863],
[ 79.17714307],
[ 79.17745505],
[ 79.17399155],
[ 79.16276343],
[ 79.17690569],
[ 79.16082455],
[ 79.15471336],
[ 79.14493769],
[ 79.1295568 ],
[ 79.12685164],
[ 79.09121619],
[ 79.07152168],
[ 79.06268516],
[ 79.05926332],
[ 79.05292004],
[ 79.04663989],
[ 79.04262294],
[ 79.03624863],
[ 78.78605754],
[ 79.02540096],
[ 79.01354128],
[ 78.98201161],
[ 79.00816682],
[ 79.00069438],
[ 79.00062309],
[ 79.01292095],
[ 78.98031297],
[ 78.96549734],
[ 78.97986125],
[ 78.99405245],
[ 78.9670362 ],
[ 78.96454757],
[ 78.96479425],
[ 78.96614383],
[ 78.99380842],
[ 78.93016489],
[ 78.90415922],
43
[ 78.82333275],
[ 78.87148624],
[ 79.00924227],
[ 79.02355831],
[ 79.06008207],
[ 79.06949617],
[ 79.0980048 ],
[ 79.12365736],
[ 79.01911686],
[ 79.30277287],
[ 79.55835644],
[ 79.47596804],
[ 79.65738702],
[ 79.72570794],
[ 79.65909038],
[ 79.70762525],
[ 79.6795497 ],
[ 79.76455661],
[ 79.73191072],
[ 79.5274163 ],
[ 79.79387245],
[ 80.11141039],
[ 80.10033204],
[ 79.7247873 ],
[ 79.56165484],
[ 80.23912274],
[ 80.06197735],
[ 79.97552007],
[ 79.99499562],
[ 80.03134109],
[ 80.43693908],
[ 80.26313801],
[ 80.13030897],
[ 79.96030285],
[ 80.16366845],
[ 80.07893123],
[ 80.15284638],
[ 80.05309941],
[ 80.17926149],
[ 80.30565395],
[ 80.43409699],
[ 80.46142734],
[ 80.3626625 ],
[ 80.29619912],
[ 80.20685645],
[ 79.99018969],
[ 80.14903156],
[ 80.0923221 ],
[ 80.05093483],
[ 79.86578821],
[ 79.95180016],
[ 80.08589886],
[ 80.09202413],
[ 80.20087533],
44
[ 80.34813117],
[ 80.32491874],
[ 80.12088524],
[ 79.80086194],
[ 79.98145812],
[ 79.99482688],
[ 79.9864941 ],
[ 79.93211427],
[ 79.87800131],
[ 79.85882973],
[ 79.82686637],
[ 79.82793635],
[ 79.8021976 ],
[ 79.75848954],
[ 79.74271104],
[ 79.71871188],
[ 79.71127689],
[ 79.69757763],
[ 79.61953547],
[ 79.55368084],
[ 79.51694512],
[ 79.50413826],
[ 79.48961162],
[ 79.51048849],
[ 79.53839899],
[ 79.50846745],
[ 79.51693288],
[ 79.47238283],
[ 79.49446678],
[ 79.49115076],
[ 79.48062862],
[ 79.48000001],
[ 79.46798024],
[ 79.45457554],
[ 79.44515963],
[ 79.43029371],
[ 79.40386801],
[ 79.43261256],
[ 79.42761765],
[ 79.42188076],
[ 79.39152056],
[ 79.36561653],
[ 79.39010678],
[ 79.36051514],
[ 79.3591428 ],
[ 79.33649049],
[ 79.36758544],
[ 79.3787324 ],
[ 79.35486537],
[ 79.33039274],
[ 79.34032933],
[ 79.3356268 ],
[ 79.34869454],
[ 79.33146635],
45
[ 79.32397755],
[ 79.30631086],
[ 79.31494732],
[ 79.3188859 ],
[ 79.29456379],
[ 79.29525658],
[ 79.28911181],
[ 79.22487952],
[ 79.29198063],
[ 79.24706401],
[ 79.2852689 ],
[ 79.26360997],
[ 79.23388456],
[ 79.21120348],
[ 79.17808234],
[ 79.21201984],
[ 79.20115617],
[ 79.25422017],
[ 79.22296419],
[ 79.20335655],
[ 79.18426118],
[ 79.19576436],
[ 79.21094674],
[ 79.20164434],
[ 79.14831966],
[ 79.14394657],
[ 79.17584333],
[ 79.17661474],
[ 79.17858397],
[ 79.21956417],
[ 79.15694061],
[ 79.135144 ],
[ 79.10696226],
[ 79.13712371],
[ 79.11832785],
[ 79.13082007],
[ 79.16990762],
[ 79.26594357],
[ 79.20633508],
[ 79.17997123],
[ 79.19056313],
[ 79.24559072],
[ 79.2991557 ],
[ 79.32951848],
[ 79.37792683],
[ 79.39009448],
[ 79.37335895],
[ 79.32990797],
[ 79.36737757],
[ 79.47698453],
[ 79.31724606],
[ 79.40466511],
[ 79.36099632],
[ 79.20698687],
46
[ 79.37380626],
[ 79.35396693],
[ 79.35145827],
[ 79.59395697],
[ 79.36311388],
[ 79.43857128],
[ 79.45080374],
[ 79.46287398],
[ 79.4745953 ],
[ 79.48733725],
[ 79.49857475],
[ 79.51074651],
[ 79.53558149],
[ 79.55735569],
[ 79.56731596],
[ 79.57694579],
[ 79.58747474],
[ 79.59657798],
[ 79.6054916 ],
[ 79.61518184],
[ 79.62350183],
[ 79.6315964 ],
[ 79.63943431],
[ 79.64801372],
[ 79.66336019],
[ 79.67956461],
[ 79.69514119],
[ 79.70110459],
[ 79.70772658],
[ 79.71314147],
[ 79.71830188],
[ 79.72709709],
[ 79.73175908],
[ 79.73616658],
[ 79.74033162],
[ 79.74503261],
[ 79.74865686],
[ 79.75202663],
[ 79.75589419],
[ 79.75873518],
[ 79.77531223],
[ 79.77734342],
[ 79.77978848],
[ 79.78129089],
[ 79.78253883],
[ 79.7841403 ],
[ 79.7848576 ],
[ 79.78532229],
[ 79.78553249],
[ 79.78569379],
[ 79.78872396],
[ 79.78840443],
[ 79.78732337],
[ 79.79075994],
47
[ 79.79356865],
[ 79.79181705],
[ 79.78940698],
[ 79.78711987],
[ 79.78416431],
[ 79.78095208],
[ 79.77750428],
[ 79.77410634],
[ 79.77008522],
[ 79.76585417],
[ 79.76161751],
[ 79.75685988],
[ 79.75179211],
[ 79.74903105],
[ 79.74581903],
[ 79.73969011],
[ 79.73539773],
[ 79.73104008],
[ 79.72419096],
[ 79.71701322],
[ 79.70965295],
[ 79.69292153],
[ 79.64165821],
[ 79.63763914],
[ 79.60936788],
[ 79.58500525],
[ 79.60513488],
[ 79.57860507],
[ 79.57192996],
[ 79.54783203],
[ 79.54629962],
[ 79.56114733],
[ 79.58982235],
[ 79.57081399],
[ 79.49032151],
[ 79.51132877],
[ 79.51807422],
[ 79.50144976],
[ 79.52572317],
[ 79.46713954],
[ 79.41465323],
[ 79.38522414],
[ 79.35796392],
[ 79.29057735],
[ 79.29441399],
[ 79.2894801 ],
[ 79.29945966],
[ 79.30419476],
[ 79.30191289],
[ 79.30330844],
[ 79.30997796],
[ 79.32445166],
[ 79.33891301],
[ 79.36297942],
48
[ 79.37260667],
[ 79.38661781],
[ 79.36348905],
[ 79.42324434],
[ 79.3744849 ],
[ 79.41341231],
[ 79.3933769 ],
[ 79.47642139],
[ 79.5247705 ],
[ 79.52291583],
[ 79.50708902],
[ 79.72083884],
[ 79.67310478],
[ 79.56155247],
[ 79.60537555],
[ 79.86037116],
[ 79.95368936],
[ 79.85797184],
[ 79.86796146],
[ 79.83841639],
[ 79.90992159],
[ 79.81168094],
[ 79.93210744],
[ 79.84700995],
[ 80.08287099],
[ 80.21543579],
[ 80.22568745],
[ 80.45177154],
[ 80.0553673 ],
[ 80.46212542],
[ 80.20608625],
[ 80.26359789],
[ 80.24088163],
[ 80.13932885],
[ 80.06719099],
[ 80.34040847]])
In [353]: temp_predvect_o2.size
#Sanity Check
Out[353]: 350
In [354]: trainData_out_temp.size
#Sanity Check
Out[354]: 1080
In [355]: o2_predcomp_temp=pd.DataFrame({’Predicted_Temp’:temp_predvect_o2[:,0],\
’Actual_Temp’:testData_out_temp[:,0]})
#Importing to a pandas dataframe
o2_predcomp_temp.head()
Out[355]: Actual Temp Predicted Temp
0 79.70 79.261092
1 79.52 79.239357
49
2 81.86 79.227060
3 79.70 79.215830
4 79.52 79.201193
In [357]: o2_predcomp_temp.index=testData.index
o2_predcomp_temp.head()
Out[357]: Actual Temp Predicted Temp
2016-05-02 00:00:00 79.70 79.261092
2016-05-02 00:20:00 79.52 79.239357
2016-05-02 00:30:00 81.86 79.227060
2016-05-02 00:40:00 79.70 79.215830
2016-05-02 00:50:00 79.52 79.201193
Below, we plot the timeseries of predicted vs actual temperature. An immediate takeaway from thisvisualization is that the fit is more accurate than the one observed in the first order.
In [428]: plot_17=o2_predcomp_temp.plot(figsize=[20,8])
plot_17.set_ylabel(’Internal Temperature’, color=’b’)
plot_17.set_xlabel(’Time Series’, color=’b’)
Out[428]: <matplotlib.text.Text at 0x115fc59d0>
Humidity We now repeat the process for the second order multivariate regression for humidity.
In [360]: a_funct_hum_o2 = np.linalg.lstsq(trainData_pred_o2, \
trainData_out_hum)
#Calculating coefficient vector, loss, rank, and blah
In [361]: a_funct_hum_o2
Out[361]: (array([[ 3.01610187e+00],
[ 3.01610187e+00],
[ -4.39954966e-03],
[ -1.39898784e-02],
[ -1.88188131e-02],
[ 9.34283503e-01],
[ 3.01610187e+00],
50
[ -4.39954970e-03],
[ -1.39898784e-02],
[ -1.88188131e-02],
[ 9.34283503e-01],
[ 4.44596105e-06],
[ 4.21982402e-05],
[ 1.16691276e-05],
[ -7.32551283e-05],
[ 1.04690212e-03],
[ 5.49603421e-03],
[ -1.12300784e-02],
[ -1.41626278e-04],
[ -8.73702840e-03],
[ -6.46861709e-02]]),
array([], dtype=float64),
15,
array([ 7.45986678e+06, 3.55188748e+05, 1.95361515e+05,
1.20910776e+05, 4.58000766e+04, 1.77848563e+04,
6.44400964e+03, 3.52987078e+03, 1.11685798e+03,
6.59266099e+02, 3.16040890e+02, 8.76511721e+01,
4.58666434e+01, 5.27359788e+00, 5.69081560e-01,
6.44149164e-11, 3.62070457e-11, 2.46047174e-12,
3.79561191e-13, 7.49638951e-14, 3.40372875e-14]))
In [362]: hum_predvect_o2=np.dot(testData_pred_o2,a_funct_hum_o2[0])
#Prediction basing on estimated params
hum_predvect_o2
Out[362]: array([[ 36.94742507],
[ 37.06051328],
[ 37.09297625],
[ 37.08126674],
[ 37.200566 ],
[ 37.24936416],
[ 37.28476052],
[ 37.32440838],
[ 37.21884328],
[ 37.10890248],
[ 37.09507353],
[ 37.29980235],
[ 37.18224877],
[ 37.12852365],
[ 37.06934887],
[ 37.15338781],
[ 37.23877887],
[ 37.14027356],
[ 36.98012214],
[ 37.01220851],
[ 37.0375901 ],
[ 36.94963994],
[ 37.06327434],
[ 37.02542519],
[ 37.13920748],
[ 36.66322381],
[ 37.18824212],
51
[ 37.63325982],
[ 37.22514343],
[ 37.13098688],
[ 36.99931151],
[ 36.96778264],
[ 37.12554567],
[ 36.84268285],
[ 36.74075407],
[ 36.74718548],
[ 36.8576679 ],
[ 36.57527356],
[ 36.63747832],
[ 36.76127063],
[ 36.95512298],
[ 37.22800239],
[ 37.08396257],
[ 37.21558138],
[ 37.32798996],
[ 37.43898509],
[ 38.13832119],
[ 38.18754014],
[ 38.43680711],
[ 38.29158363],
[ 38.24648004],
[ 37.70068733],
[ 37.6890759 ],
[ 38.23852237],
[ 36.24636167],
[ 37.22488093],
[ 36.18993785],
[ 36.07113121],
[ 36.67667614],
[ 36.13431599],
[ 36.94408314],
[ 35.90210166],
[ 36.76720858],
[ 35.65486806],
[ 36.31789059],
[ 34.56262677],
[ 35.04053705],
[ 35.86590409],
[ 34.63672815],
[ 34.50639377],
[ 35.30404882],
[ 35.87867822],
[ 35.90795385],
[ 35.05317725],
[ 34.88207164],
[ 35.31208764],
[ 34.87814455],
[ 34.07515819],
[ 34.21674803],
[ 34.41140419],
[ 33.97190553],
52
[ 34.87006642],
[ 34.07124904],
[ 33.66164081],
[ 33.13398557],
[ 34.18269706],
[ 34.01745645],
[ 33.57423476],
[ 33.56923775],
[ 33.73823863],
[ 33.41848885],
[ 33.79087662],
[ 33.84794439],
[ 33.60594453],
[ 33.43447777],
[ 33.04869539],
[ 32.93461534],
[ 32.61737369],
[ 32.45079974],
[ 32.4427548 ],
[ 32.76847465],
[ 32.81692624],
[ 32.79501637],
[ 32.62623656],
[ 32.59756998],
[ 32.70245473],
[ 32.85499103],
[ 32.84601459],
[ 32.73302901],
[ 32.68450074],
[ 32.67546657],
[ 33.00458141],
[ 33.05679948],
[ 32.60054536],
[ 32.72799082],
[ 32.7624013 ],
[ 32.47100198],
[ 32.70324715],
[ 33.00029885],
[ 33.11814471],
[ 32.92994073],
[ 32.94649992],
[ 33.14082963],
[ 32.88534001],
[ 33.14234777],
[ 32.86271011],
[ 33.17565862],
[ 33.14094359],
[ 33.06226189],
[ 33.05433673],
[ 32.79685589],
[ 32.78057734],
[ 32.74434628],
[ 32.57857061],
[ 32.40550492],
53
[ 32.78204778],
[ 32.82516927],
[ 32.46105924],
[ 32.1095003 ],
[ 31.95456472],
[ 32.20977095],
[ 31.89607712],
[ 31.82962505],
[ 31.66005611],
[ 31.93286739],
[ 32.04826017],
[ 31.9001194 ],
[ 31.92569857],
[ 31.92618803],
[ 31.86655378],
[ 32.02275311],
[ 31.86186185],
[ 31.79414182],
[ 31.59306959],
[ 31.67627217],
[ 31.71600073],
[ 31.51049409],
[ 31.42466805],
[ 31.39656041],
[ 30.93582844],
[ 31.51235181],
[ 30.81866832],
[ 31.14890098],
[ 30.97226316],
[ 30.74715358],
[ 30.54331185],
[ 30.38452734],
[ 30.72479318],
[ 30.67062724],
[ 31.06111962],
[ 30.84058162],
[ 30.75211621],
[ 30.56088071],
[ 30.62940524],
[ 30.76745644],
[ 30.76339368],
[ 30.40443464],
[ 30.41687608],
[ 30.71017711],
[ 30.73850647],
[ 30.74540579],
[ 30.98425961],
[ 30.50231405],
[ 30.30587165],
[ 30.21589172],
[ 30.22986926],
[ 30.05601929],
[ 30.09544616],
[ 30.17837581],
54
[ 30.44082933],
[ 30.15626898],
[ 30.25482347],
[ 29.89334156],
[ 30.00336374],
[ 30.06369182],
[ 30.27579956],
[ 30.33468662],
[ 29.81265224],
[ 30.07814424],
[ 30.05181979],
[ 30.06826494],
[ 30.00362681],
[ 29.53423549],
[ 29.62301269],
[ 29.22277132],
[ 28.60856155],
[ 28.97832946],
[ 28.62736151],
[ 28.7934091 ],
[ 29.26901184],
[ 29.33528347],
[ 28.83991747],
[ 28.80624227],
[ 28.77341583],
[ 28.74156904],
[ 28.69753454],
[ 28.66749467],
[ 28.66357442],
[ 28.67099258],
[ 28.66706819],
[ 28.64076362],
[ 28.61537503],
[ 28.57784458],
[ 28.55427981],
[ 28.53166363],
[ 28.4968883 ],
[ 28.47606605],
[ 28.45618652],
[ 28.43722572],
[ 28.40611964],
[ 28.35977777],
[ 28.36862295],
[ 28.37821502],
[ 28.3649229 ],
[ 28.33959647],
[ 28.3280721 ],
[ 28.31745743],
[ 28.34294652],
[ 28.33406751],
[ 28.3260982 ],
[ 28.31913513],
[ 28.3001399 ],
[ 28.29491311],
55
[ 28.29059602],
[ 28.27447962],
[ 28.27199535],
[ 28.31964316],
[ 28.31913892],
[ 28.30693298],
[ 28.30826156],
[ 28.31049984],
[ 28.30092948],
[ 28.3051473 ],
[ 28.3101281 ],
[ 28.3160186 ],
[ 28.3180237 ],
[ 28.37245082],
[ 28.36921166],
[ 28.37975853],
[ 28.41632001],
[ 28.45362838],
[ 28.45447788],
[ 28.4686594 ],
[ 28.47095663],
[ 28.48717218],
[ 28.50430646],
[ 28.5221402 ],
[ 28.52810668],
[ 28.54822086],
[ 28.56879713],
[ 28.57751873],
[ 28.5999188 ],
[ 28.62372139],
[ 28.67379933],
[ 28.7119205 ],
[ 28.73839377],
[ 28.78752308],
[ 28.82449126],
[ 28.85334954],
[ 28.883674 ],
[ 28.91436069],
[ 28.40427458],
[ 27.53860714],
[ 27.61175552],
[ 27.43177535],
[ 27.20393171],
[ 27.42970503],
[ 27.20414728],
[ 27.18985689],
[ 26.84308963],
[ 26.89690263],
[ 27.19426835],
[ 27.82129411],
[ 28.12585732],
[ 27.34931848],
[ 27.4730532 ],
[ 27.64830809],
56
[ 27.52311711],
[ 27.87445943],
[ 28.0197126 ],
[ 28.78889207],
[ 28.79015217],
[ 29.27083651],
[ 30.21482415],
[ 30.27934272],
[ 30.34136539],
[ 30.36562434],
[ 30.11304281],
[ 29.67846627],
[ 29.68887111],
[ 29.72686336],
[ 29.78128764],
[ 29.62712164],
[ 29.64158225],
[ 29.47487071],
[ 29.32038537],
[ 28.97358725],
[ 29.18580507],
[ 28.92757311],
[ 29.1713987 ],
[ 28.8287174 ],
[ 28.89287462],
[ 29.1641531 ],
[ 28.53554751],
[ 28.23957023],
[ 28.53430848],
[ 27.92397824],
[ 27.43470891],
[ 27.67584814],
[ 27.22307919],
[ 27.30704508],
[ 26.66309249],
[ 27.10056019],
[ 27.24171841],
[ 27.17882823],
[ 26.67499559],
[ 26.88967711],
[ 26.40210763],
[ 26.48112716],
[ 27.04700785],
[ 27.03364485],
[ 27.86796989],
[ 26.84631496],
[ 27.35860494],
[ 26.57250651],
[ 26.50467265],
[ 26.7387155 ],
[ 26.61551909],
[ 26.17752487],
[ 26.87165016]])
57
In [363]: hum_predvect_o2.size
Out[363]: 350
In [364]: o2_predcomp_hum=pd.DataFrame({’Predicted_Hum’:hum_predvect_o2[:,0],\
’Actual_Hum’:testData_out_hum[:,0]})
#trainData_out_temp
o2_predcomp_hum.head()
Out[364]: Actual Hum Predicted Hum
0 37.0 36.947425
1 36.9 37.060513
2 36.7 37.092976
3 37.0 37.081267
4 36.1 37.200566
In [365]: o2_predcomp_hum.index=testData.index
o2_predcomp_hum.head()
Out[365]: Actual Hum Predicted Hum
2016-05-02 00:00:00 37.0 36.947425
2016-05-02 00:20:00 36.9 37.060513
2016-05-02 00:30:00 36.7 37.092976
2016-05-02 00:40:00 37.0 37.081267
2016-05-02 00:50:00 36.1 37.200566
Notice, again, that the visualization below shows an improved second order fit for humidity than the onewe noticed in the first order regression.
In [429]: plot_18=o2_predcomp_hum.plot(figsize=[20,8],fontsize=12)
plt.legend(ncol=2)
plot_18.set_ylabel(’Internal Humidity’, color=’b’)
plot_18.set_xlabel(’Time Series’, color=’b’)
Out[429]: <matplotlib.text.Text at 0x117aea190>
58
Statistical Inference
We now lay out the framework for an ANOVA (Analysis Of Variance) test, which, given multiple possiblepredictive models could be used to judge the relative significance of a model when compared to other models.(For example, a comparison between first and higher order regression, and if one is significantly better thanthe other) We lay out a framework for carrying out an ANOVA F Test, but stop short of comparing multiplemodels using the F-Test given time consraints. Future work could include a robust ANOVA F Test tocompare between various predictive models to determine one that best fits the population of the observeddata. As mentioned previously, error can be due to either one of two reasons:
1) The lack of expressiveness in a regression model
2) Predictor variables
These together comprose of what we will refer to as regression error. The other component of error,not mentioned previously, is more vague and difficult to isolate. Therefore, we attribute it to unexplained“error”.
First Order Regression Constants We first calculate degrees of freedom. The regression degree offreedom is given by the number of input variables in the first order regreesion.
In [367]: df_reg=a_funct_temp_o1[0].size
#Regression Degree of Freedom
df_reg
Out[367]: 5
The total degrees of freedom is equal to the training set size.
In [368]: df_tot=o1_predcomp_temp.index.size
df_tot
Out[368]: 350
The degrees of freeom of error is given by the difference between total degrees of freedom and theregression degrees of freedom minus 1.
In [369]: df_err=df_tot-df_reg-1
df_err
Out[369]: 344
Temperature Now, we calculate the regression and error sum of squared losses for the temperatureregression model.
In [370]: SSerr_temp_o1=numpy.linalg.norm(o1_predcomp_temp.Actual_Temp-\
o1_predcomp_temp.Predicted_Temp)
#Sum of squared errors between predicted and actual values for temperatures (SSerr)
SSerr_temp_o1
Out[370]: 8.3417961142492878
In [371]: SSreg_temp_o1=numpy.linalg.norm(o1_predcomp_temp.Predicted_Temp\
-np.mean(o1_predcomp_temp.Predicted_Temp))
#Sum of squared errors between predicted values, and the mean of the predicted values (SSreg)
SSreg_temp_o1
Out[371]: 7.3338875775485022
59
In [372]: SStot_temp_o1=SSerr_temp_o1+SSreg_temp_o1
#Total Sum of squared errors is equal to SSreg+SSerr
SStot_temp_o1
Out[372]: 15.67568369179779
We now compute the mean of these squared errors that would give us the average squared error perdegree of freedom.
In [373]: MSreg_temp_o1=SSreg_temp_o1/df_reg
#Mean of Squared Error for Regression
MSreg_temp_o1
Out[373]: 1.4667775155097005
In [374]: MSErr_temp_o1=SSerr_temp_o1/df_err
#Also termed as regression Variance S^2.
#The square root of this gives the root mean squared error.
MSErr_temp_o1
Out[374]: 0.024249407308864209
In [375]: F_temp_o1=MSreg_temp_o1/MSErr_temp_o1
#Computing the F-Statistic to calculate the significance of the model
F_temp_o1
Out[375]: 60.487149101311424
Given the Mean squared errors for the explainable error (MSreg), and unexplainable error (MSerr), theANOVA F test seeks to calculate the F-Statistic, which is calculated by finding the ratio between MSreg andMSerr. Large values for the F-Statistic indicate large values of MSreg, which indicate that fitted values arefar from the overall mean, thus indicating that the expected response would change significantly along withthe size of the regression plane, as is the case here. Therefore, there is strong evidence for the implementationof higher order models.
Next, we compute the Rˆ2 goodness of fit, which as expected, is not ideal.
In [376]: R_Sq_temp_01=SSreg_temp_o1/SStot_temp_o1
R_Sq_temp_01
Out[376]: 0.46785120966595645
Testing Individual Slopes (Temperature) We now compute the variance-covariance matrix. Thediagonal elements of the variance-covariance matrix give us a measure of the sensitivity of the final outputto a specific input, keeping all else constant. For example, cell 2x2 in the ‘variance vect temp o1’ dataframegives us the sensitivity of the internal temperature to the external humidity, keeping all else constant. Thematrix, therefore, gives us a measure of causal inference, giving us an ordered list of the factors that theinternal temperature is most sensitive to.
In [413]: variance_vect_temp_o1=np.multiply(linalg.inv(np.dot(np.transpose(testData_pred),\
testData_pred)),MSErr_temp_o1)
variance_vect_temp_o1
Out[413]: array([[ 2.88375044e-02, 8.46857825e-08, -4.09911905e-04,
-7.83569224e-05, 6.86531017e-04],
[ 8.46857825e-08, 2.02745805e-09, -2.78077457e-08,
1.46927160e-08, -2.50346951e-08],
[ -4.09911905e-04, -2.78077457e-08, 6.87674723e-06,
60
4.59742149e-07, -1.19205705e-05],
[ -7.83569224e-05, 1.46927160e-08, 4.59742149e-07,
6.40577242e-07, -8.47686074e-07],
[ 6.86531017e-04, -2.50346951e-08, -1.19205705e-05,
-8.47686074e-07, 5.12000045e-05]])
In [411]: variance_vect_temp_o1=pd.DataFrame({’Constant’:variance_vect_temp_o1[:,0],\
’Solar_Radiation’:variance_vect_temp_o1[:,1],\
’External_Temperature’:variance_vect_temp_o1[:,2],\
’External_Humidity’:variance_vect_temp_o1[:,3],\
’Wind Speed’:variance_vect_temp_o1[:,4]})
variance_vect_temp_o1
Out[411]: Constant External Humidity External Temperature Solar Radiation \0 2.883750e-02 -7.835692e-05 -4.099119e-04 8.468578e-08
1 8.468578e-08 1.469272e-08 -2.780775e-08 2.027458e-09
2 -4.099119e-04 4.597421e-07 6.876747e-06 -2.780775e-08
3 -7.835692e-05 6.405772e-07 4.597421e-07 1.469272e-08
4 6.865310e-04 -8.476861e-07 -1.192057e-05 -2.503470e-08
Wind Speed
0 6.865310e-04
1 -2.503470e-08
2 -1.192057e-05
3 -8.476861e-07
4 5.120000e-05
As is clear from the diagonal elements of the matrix above, the internal temperature’s sensitivity to thepredictor variables is as follows:
1) Wind Speed
2) External Temperature
3) Tied between Solar Radiation / External Humidity
Humidity We carry out the same statistical inference process for the first order humidity regressionas the one used for temperature. We will, therefore, not go into specific detail of each step, but will makenote of interesting results worth taking note of.
In [390]: SSerr_hum_o1=numpy.linalg.norm(o1_predcomp_hum.Actual_Hum-\
o1_predcomp_hum.Predicted_Hum)
#SSErr
SSerr_hum_o1
Out[390]: 37.057462760024826
In [391]: SSreg_hum_o1=numpy.linalg.norm(o1_predcomp_hum.Predicted_Hum-\
np.mean(o1_predcomp_hum.Predicted_Hum))
SSreg_hum_o1
Out[391]: 60.539649317778931
In [392]: SStot_hum_o1=SSerr_hum_o1+SSreg_hum_o1
SStot_hum_o1
Out[392]: 97.597112077803757
61
In [393]: MSreg_hum_o1=SSreg_hum_o1/df_reg
MSreg_hum_o1
Out[393]: 12.107929863555785
In [394]: MSErr_hum_o1=SSerr_hum_o1/df_err
#Regression Variance S^2
MSErr_hum_o1
Out[394]: 0.10772518244193263
In [395]: F_hum_o1=MSreg_hum_o1/MSErr_hum_o1
F_hum_o1
Out[395]: 112.39646653726813
Again, notice that we arrive on a reasoably large value for the F-Statistic, thus indicating that higherorder models could potentially improve on this model.
First order humidity regression gives a better fit than the first order temperature regression fit.
In [396]: R_Sq_hum_01=SSreg_hum_o1/SStot_hum_o1
R_Sq_hum_01
Out[396]: 0.62030164652328168
Testing Individual Slopes (Humidity)
In [407]: testData_pred=testData_pred.as_matrix(columns=None)
In [408]: variance_vect_hum_o1=np.multiply(linalg.inv(np.dot(np.transpose(testData_pred)\
,testData_pred)),MSErr_hum_o1)
variance_vect_hum_o1
Out[408]: array([[ 1.28107272e-01, 3.76206777e-07, -1.82098615e-03,
-3.48091549e-04, 3.04983450e-03],
[ 3.76206777e-07, 9.00674749e-09, -1.23532688e-07,
6.52706883e-08, -1.11213733e-07],
[ -1.82098615e-03, -1.23532688e-07, 3.05491528e-05,
2.04235123e-06, -5.29557533e-05],
[ -3.48091549e-04, 6.52706883e-08, 2.04235123e-06,
2.84569018e-06, -3.76574717e-06],
[ 3.04983450e-03, -1.11213733e-07, -5.29557533e-05,
-3.76574717e-06, 2.27450088e-04]])
In [409]: variance_vect_hum_o1=pd.DataFrame({’Constant’:variance_vect_hum_o1[:,0],\
’Solar_Radiation’:variance_vect_hum_o1[:,1],\
’Temperature’:variance_vect_hum_o1[:,2],\
’Humidity’:variance_vect_hum_o1[:,3],\
’Wind Speed’:variance_vect_hum_o1[:,4]})
variance_vect_hum_o1
Out[409]: Constant Humidity Solar Radiation Temperature Wind Speed
0 0.128107 -3.480915e-04 3.762068e-07 -0.001821 0.003050
1 0.000000 6.527069e-08 9.006747e-09 -0.000000 -0.000000
2 -0.001821 2.042351e-06 -1.235327e-07 0.000031 -0.000053
3 -0.000348 2.845690e-06 6.527069e-08 0.000002 -0.000004
4 0.003050 -3.765747e-06 -1.112137e-07 -0.000053 0.000227
62