python in earth science - imprs-gbgc · contents iii 3input/output of ﬁles30 3.1 read text file...

PYTHON IN EARTH SCIENCE

A BRIEF INTRODUCTION

by

Sujan Koirala and Jake Nelson

V ersion 1.0

February, 2017.

Department of Biogeochemical Integration,

Max Planck Institute for Biogeochemistry

Jena, Germany

FOREWORD

This document is a summary of our experiences in learning to use Python over last

several years. It is not intended to be a standalone document that will help the user to

solve every problem. What we hope is to encourage new users to delve into a wonderful

programming language.

Sujan Koirala and Jake Nelson

[email protected]

[email protected]

Jena, Germany

February, 2017

i

mailto:[email protected]

mailto:[email protected]

CONTENTS

1 Installation and Package Management 1

1.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Python, a brief history . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 Python 2, and Python 3 . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Environments and packages . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Using other people’s code . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Which package manager to use? . . . . . . . . . . . . . . . . . . . . 4

1.2.3 Versions, packages, environments, why so complicated? . . . . . 4

1.3 Installing Anaconda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Windows installation notes . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.2 OSX installation notes . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.3 Linux installation notes . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Creating your first environment . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4.1 Installing a package. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Python Data Types 8

2.1 Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Boolean Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.3 Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.4 Bytes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Combined Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.2 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.3 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.4 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2.5 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

ii

CONTENTS iii

3 Input/Output of files 30

3.1 Read Text File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.1 Plain Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.2 Comma Separated Text . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.3 Unstructured Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Save Text File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Read Binary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Write Binary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5 Read NetCDF Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.6 Write NetCDF Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.7 Read MatLab Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.8 Read Excel Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Data Operations in Python 36

4.1 Size and Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Slicing and Dicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 Built-in Mathematical Functions . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.5 String Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.6 Other Useful Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Essential Python Scripting 51

5.1 Control Flow Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1.1 if Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1.2 for Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1.3 while Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.1.4 break and continue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.1.5 range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2 Python Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3 Python Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.4 Python Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5 Additional Relevant Modules. . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.5.1 sys Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

CONTENTS iv

5.5.2 os Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.5.3 Errors and Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Advanced Statistics and Machine Learning 61

6.1 Quick overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.1.1 required packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.1.2 Overview of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2 Import and prepare the data . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.3 Setting up the gapfillers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.4 Actually gapfilling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.5 And now the plots! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.5.1 scatter plots! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.5.2 Distributions with KDE . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.6 Bonus points! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7 Data Visualization and Plotting 71

7.1 Plotting a simple figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.2 Multiple plots in a figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.3 Plot with Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.4 Scatter Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.5 Playing with the Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.6 Map Map Map! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.6.1 Global Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.6.2 Customizing a Colorbar . . . . . . . . . . . . . . . . . . . . . . . . . 79

LISTINGS

v

1INSTALLATION OF PYTHON AND

PACKAGE MANAGEMENT

This chapter provides information on the installation of core Python and

addition of packages.

1

1.1. Introduction 2

1.1. INTRODUCTION

If you are currently using a recent Mac or Linux operating system, open a terminal and

type,

:∼ $ python

and you should see something like,

Python 2.7.12

Type "help", "copyright", "credits" or "license" for more information.

>>>You have just entered the native installation of Python on your computer, no extra

steps needed. This is because, though it is a great tool for earth science and data

analytics, Python is a general purpose language that is used by all sorts of programs

and utilities. While is it nice the Python is a very open and widely used tool, one should

also take care that this native installation is not modified to the point that the other

useful and essential utilities that depend on it are disrupted. For instance, a package or

command may no longer be installed where it originally was by the operating system.

For this reason, this chapter will outline how to install a modern version of Python, as

well as many packages useful for data science, in a tidy environment all it’s own.

1.1.1. PYTHON, A BRIEF HISTORY

As the story goes, in 1989 Guido van Rossum decided he needed something to do over

the Christmas holidays, and instead of reading a nice book or learning to brew his own

beer, he decided to develop the scripting language with the name of Python, named

after Monty Python’s Flying Circus. Since then, Python has come to be know by several

core principles, most notable of which are the focus on readability and requiring fewer

lines of code. Because of this, lines of code can almost be read as a sentence in plain

English. For example, if I would like to add one to every number in my list, I would,

>>> [number+1 for number in MyList]

Though this may look daunting if you are new to coding, if you read it out loud you

can almost hear what it does. And along this line, the Python philosophy tends not to

be that there are many clever ways to do one thing, but one very clear way. Because of

these ideologies Python can be a very useful and rewarding coding language to learn,

1.2. Environments and packages 3

which is reflected in it’s popularity.

1.1.2. PYTHON 2, AND PYTHON 3

As you start in Python, you will quickly find yourself wondering why there are two

different versions being used. Python 2 was released in 2000 as the first major update,

and many programs have been written using this flavor. However, in 2010 with the

release of version 2.7, it was announce that Python 2 would be phased out in favor of

the new Python 3, so there is no plan for a version 2.8. This major update from 2 to

3 was made to change some small yet significant things to the language, such as how

it handles text data and iterates through lists and dictionaries. The idea is that it is

better to update a language to fix things, than always dealing with small bugs because

of refusal to change. As Python 2 is scheduled to be retired in the next few years, this

manual will focus on using Python 3. This does mean that our Python 3 code may

not work with our native Python 2 installation, but in the realm of data science, as

you will be using so many specialized packages of code, this would be the case anyway.

In the end, you will be using a self contained Python environment that contains our

Python installation, as well as all the code you will be using, in one neat little box.

1.2. ENVIRONMENTS AND PACKAGES

1.2.1. USING OTHER PEOPLE’S CODE

As Python is a general purpose language, the basic functionality out of the box is also

very general: things such as basic math, file manipulation, and printing output. So

if you want to do anything beyond what is defined in the core language, you need to

write our own little bit of code to do it. However, as you are taking a Python course,

you can assume that the first time you need a bit of code that the core Python doesn’t

have built in, something like calculating the standard deviation of a set of numbers,

someone else will have probably run into the same issue before you. Luckily, the Python

community is very active in writing these bits of code and sharing them so that you

don’t have to write every function from scratch. Not only that, many of these little bits

of code have been bundled into large collections of code called packages. For example,

the mean, median, standard deviation, percentile, and other statistical functions are

1.2. Environments and packages 4

already built into a package called NumPy (Numerical Py thon) which gives you access

to a whole bunch of bits of code. Not only that, there entire package managers that

will take care of downloading and installing the package, as well as making sure it

plays nice with all the other packages you are using, all you have to do is tell it which

package!

1.2.2. WHICH PACKAGE MANAGER TO USE?

Probably the most common package manager is called pip. pip is a wonderfully useful

tool that is widely supported, which you will not use. Instead you will use Anaconda

for the following reasons:

• Anaconda is designed for data science.

• Anaconda will handle not only the Python packages, but non-Python thing such

as HDF5 (which allows us to read some data files) and the Math Kernel Library.

It will even manage an R installation.

• Anaconda also manages environments, which:

Keep our Python installations working together.

Keep separate collections of packages in case some don’t work well together.

Are duplicatable and exportable, so our work can be replicated.

1.2.3. VERSIONS, PACKAGES, ENVIRONMENTS, WHY SO COMPLICATED?

Though this all may seem a bit complicated to just make a plot or do some math, it

becomes necessary because of two main issues: the computer needs to know where to

look for things, and what to call them.

Just like when you go back to look at the wonderful photos you took on vacation

3 years ago only to find a giant mess of folders and sub-folders to go through, your

computer also has to look through all it’s memory to find where a bit of code might be

located. When properly managed, all the files are put in the appropriate place, where

the computer can easily find them. Similarly, if I have a file in the folder Photos/

called MyBestPicture.jpg, and I have a different file in the folder Photos2/ called

MyBestPicture.jpg, when I tell my computer I want MyBestPicture.jpg, it has no idea

1.3. Installing Anaconda 5

which one you mean. In this way, by using these tools, you keep everything nice and

tidy.

1.3. INSTALLING ANACONDA

Anaconda is a commercially maintained package manager designed for data science.

As such they have made it quite easy to install on Windows, Mac, and Linux. Simply go

to https://www.continuum.io/downloads, find your operating system, and download

the appropriate Python 3.6 version installer for your operating system. Again, you

want to use version 3.6, but if you end up mixing up versions or already have another

version installed don’t panic, you can create a Python 3.6 environment later.

1.3.1. WINDOWS INSTALLATION NOTES

Installation on Microsoft Windows is fairly straight forward, but can take quite some

time. Simply follow the graphical installer, with the only thing to change is to uncheck

the option to register Anaconda as the default Python installation. Though this is not

as vital as with Unix based systems, it is still a good idea. After the long installation

prompt, you can access an Anaconda command line via Anaconda Prompt in the Start

Menu.

1.3.2. OSX INSTALLATION NOTES

Installation on OSX should be quite straight forward, simply follow the installation

guide of the graphical installer.

1.3.3. LINUX INSTALLATION NOTES

Once the file has been downloaded, open a terminal and navigate to where the file was

saved. The file installer is a bash script, which can be run by entereing

:∼ $ bash Anaconda3-FILE-NAME.sh

where Anaconda3-FILE-NAME.sh is the name of your file. The package will ask

you to review the licence information and agree. You will then be asked if you would

like to install Anaconda in another location, and you can simply install into the default

location. The installer will then proceed to install Anaconda on your machine. Once

1.4. Creating your first environment 6

the installation is complete, the installer will ask "You may wish to edit your .bashrc

or prepend the Anaconda3 install location:", followed by a suggested command that

looks something like,

export PATH=/YOUR/PATH/TO/anaconda3/bin:$PATH

In order to make Anaconda work, you need to add the file path to Anaconda to a

variable the operating system uses called $PATH. To do this, you can add a modified

version of this line to a file called .bashrc in your home folder. Simply go to your home

folder and open the file .bashrc with a text editor, and at the end of the file add the

line,

export PATH=$PATH:/YOUR/PATH/TO/anaconda3/bin

where the /YOUR/PATH/TO/anaconda3/bin is the same one that Anaconda sug-

gested at the end of installation. If you forgot it, it should be something like

/home/YOURNAME/anaconda3/bin

You may notice that you switched our path and the $PATH around. This is because

you want to add our Anaconda location to end of $PATH, meaning that the operating

system looks in this folder last instead of first. The insures that you don’t cause any

problems with the native Python installation.

1.4. CREATING YOUR FIRST ENVIRONMENT

First, you will verify that our anaconda installation is working. To do so, open a

command line and simply type,

:∼ $ conda

You should see a nice overview of how to use the conda command. If this is not

the case, either the installation didn’t work, or you might have a problem with your

PATH (where the computer looks for commands). But, if it worked, you can move on

to creating our first environment. you will name the environment CoursePy and you

will initially only require the numpy package. In the same command line, input:

:∼ $ conda create - -name CoursePy numpy

You will be asked if you would like to proceed in installing a bunch of new packages,

way more than numpy, and you can say yes. The reason so many new packages were

listed is the magic of a package manager. The basic Python 3 with the numpy package

actually depends on all these underlying dependencies, which Anaconda kindly figures

1.4. Creating your first environment 7

out for you. So now you have our nice new environment, and you can activate it by

entering

:∼ $ source activate CoursePy

on Mac or Linux and

:∼ $ activate CoursePy

on Windows.

You command line should now tell you that you are now in the CoursePy environ-

ment. If you now open a Python console by typing python in the command line, our

version should now be 3.6.0. In this same manner, you can do things like duplicate and

export our environments, or make new environments with different packages or even

different Python versions.

1.4.1. INSTALLING A PACKAGE

Now that you are in our nice new environment, you can add any package you might

need. Open an command line and enter the CoursePy environment. Now to install the

Pandas package, you simply enter,

:∼ $ conda install spyder

Anaconda will list all the package changes it will make, and ask if you would like to

proceed. Confirm yes, then let the magic happen. Now you have the Spyder IDE, which

you can use to develop code (similar concept to R Commander or the MATLAB IDE).

Anaconda has some nice documentation about how to use their software, including

how to search for packages not in their repositories, which we will not cover here. Now

that you have our installation and environment all sorted out, you can start to explore

Python itself a bit in the next chapters.

2PYTHON DATA TYPES

This chapter provides information on the basic data types in Python. It also

introduces the basic operations used to access and manipulate the data

8

2.1. Basic Data Types 9

In python, there are various types of data. Every data has a type and a value.

Every value has a fixed data type but it should not specified beforehand. The most

basic data types in python are:

1. Boolean: These are data which have only two values: True or False.

2. Numbers: These are numeric data.

3. Strings: These data are sequences of unicode characters.

4. Bytes: An immutable sequence of numbers.

Furthermore, these data types can be combined and following types of datasets can

be produced:

1. Lists: Ordered sequences of values.

2. Tuples: Ordered but immutable, i.e. cannot be modified, sequences of values.

3. Sets: Unordered bags of values.

4. Dictionaries: Unordered bag of key-value pairs.

5. Arrays: Ordered sequences of data of same type mentioned above.

2.1. BASIC DATA TYPES

In this section, a brief description of basic data types, their possible values, and various

operations that can be applied to them are described.

2.1.1. BOOLEAN DATA

These data are either True or False. If an expression can produce either yes or no

answers, booleans can be used to interpret the result. This kind of yes/no situations

are known as boolean context. Here is a simple example.

• Assign some variable (size) as 1.

In [1]: 1 size = 1


• Check if size is less than 0.

In [2]: 1 size < 0

Out[2]: 1 False

XIt is false as 1 > 0.

• Check if size is greater than 0.

In [3]: 1 size > 0

Out[3]: 1 True

XIt is true as 1 > 0.

True or False can also be treated as numbers: True=1 and False=0.

2.1.2. NUMBERS

Python supports both integers and floating point numbers. There’s no type declaration

to distinguish them and Python automatically distinguishes them apart by the presence

or absence of a decimal point.

• You can use type() function to check the type of any value or variable.

In [4]: 1 type (1)

Out[4]: 1 int

XAs expected, 1 is an int.

In [5]: 1 type (1.)

Out[5]: 1 float

XThe decimal at the end make 1. a float.

In [6]: 1 1+1

Out[6]: 1 2

XAdding an int to an int yields an int.


In [7]: 1 1+1.

Out[7]: 1 2.0

XAdding an int to a float yields a float. Python coerces the int into a float

to perform the addition, then returns a float as the result.

• Integer can be converted to float using float() and float can be converted to

integer using int()

In [8]: 1 float (2)

Out[8]: 1 2.0

In [9]: 1 int (2.6)

Out[9]: 1 2

XPython truncates the float to integer, 2.6 becomes 2 instead of 3. To

round the float number use

In [10]: 1 round (2.6)

Out[10]: 1 3.0

NUMERICAL OPERATIONS

• The / operator performs division.

In [11]: 1 1/2

Out[11]: 1 0

In [12]: 1 1/2.

Out[12]: 1 0.5

XBe careful on float or integer data type as the result can be different as

shown above.


• The // operator performs a division combined with truncating and rounding.

When the result is positive, it truncates the result but when the result is negative,

it rounds off the result to nearest integer but the result is always a float.

In [13]: 1 1.//2

Out[13]: 1 0.0

In [14]: 1 -1.//2

Out[14]: 1 -1.0

• The ‘**’ operator means “raised to the power of”. 112 is 121.

In [15]: 1 11**2

Out[15]: 1 121

In [16]: 1 11**2.

Out[16]: 1 121.0

XBe careful on float or integer data type as the result can be different as

shown above.

• The ‘%’ operator gives the remainder after performing integer division.

In [17]: 1 11%2

Out[17]: 1 1

X11 divided by 2 is 5 with a remainder of 1, so the result here is 1.

FRACTIONS

To start using fractions, import the fractions module. To define a fraction, create a

Fraction object asIn [18]: 1 import fractions

2 fractions . Fraction (1 ,2)

Out[18]: 1 Fraction (1, 2)


You can perform all the usual mathematical operations with fractions asIn [19]: 1 fractions . Fraction (1, 2) *2

Out[19]: 1 Fraction (1, 1)

TRIGONOMETRY

You can also do basic trigonometry in Python.In [20]: 1 import math

2 math.pi

Out[20]: 1 3.1415926535897931

In [21]: 1 math.sin(math.pi / 2)

Out[21]: 1 1.0

2.1.3. STRINGS

In Python, all strings are sequences of Unicode characters. It is an immutable sequence

and cannot be modified.

• To create a string, enclose it in quotes. Python strings can be defined with either

single quotes (' ') or double quotes ('' '').

In [22]: 1 s='sujan '

In [23]: 1 s=" sujan "

• The built-in len() function returns the length of the string, i.e. the number of

characters.

In [24]: 1 len(s)

Out[24]: 1 5

• You can get individual characters out of a string using index notation.

In [25]: 1 s[1]

Out[25]: 1 u

2.2. Combined Data Types 14

• You can concatenate strings using the + operator.

In [26]: 1 s+ +'koirala '

Out[26]: 1 sujan koirala

XEven space has to be specified as an empty string.

2.1.4. BYTES

An immutable sequence of numbers between 0 and 255 is called a bytes object. Each

byte within the bytes object can be an ascii character or an encoded hexadecimal

number from \x00 to \xff (0–255).

• To define a bytes object, use the b' 'syntax. This is commonly known as “byte

literal” syntax.

In [27]: 1 by = b'abcd\x65 '

2 by

Out[27]: 1 'abcde '

X\x65 is 'e'.

• Just like strings, you can use len() function and use the + operator to concatenate

bytes objects. But you cannot join strings and bytes.

In [28]: 1 len(by)

Out[28]: 1 5

In [29]: 1 by += b'\x66 '

2 by

Out[29]: 1 'abcdef '

2.2. COMBINED DATA TYPES

The basic data types explained in the previous section can be arranged in sequences

to create combined data types. These combined data types can be modified, for e.g.,

lists or are immutable which cannot be modified, for e.g., tuples. This section provides

brief description of these data and the common operations that can be used.


2.2.1. LISTS

Lists are the sequence of data stored in an arranged form. It can hold different types

of data (strings, numbers etc.) and it can be modified to add new data or remove old

data.

CREATING A LIST

To create a list: use square brackets “[ ]” to wrap a comma-separated list of values of

any data types.In [30]: 1 a_list =[ 'a','b','mpilgrim ','z','example ', 2]

2 a_list

Out[30]: 1 ['a', 'b', 'mpilgrim ', 'z', 'example ', 2]

XAll data except last data are strings. Last one is integer.In [31]: 1 a_list [ 0]

Out[31]: 1 'a'

XList data can be accessed using index.In [32]: 1 type( a_list [0])

Out[32]: 1 str

In [33]: 1 type( a_list [ -1])

Out[33]: 1 int

XType of data can be checked using type().

SLICING A LIST

Once a list has been created, a part of it can be taken as a new list. This is called

slicing the list. A slice can be extracted using indices. Let’s consider same list as

above:In [34]: 1 a_list =[ 'a','b','mpilgrim ','z','example ', 2]

• The length of the list can be obtained as:

In [35]: 1 len( a_list )

Out[35]: 1 6


Xthe index can be from 0 to 5 if we count from left to right or -1 to -6 if

we count from right to left.

• We can obtain any other list as:

In [36]: 1 b_list = a_list [0:3]

2 b_list

Out[36]: 1 ['a', 'b', 'mpilgrim ']

ADDING ITEM TO A LIST

There are 4 different ways to add item/items to a list. Let’s consider same list as

above:In [37]: 1 a_list =[ 'a','b','mpilgrim ','z','example ', 2]

1. ‘+’ operator: The + operator concatenates lists to create a new list. A list

can contain any number of items; there is no size limit.

In [38]: 1 b_list = a_list +[ 'Hydro ','Aqua ']

2 b_list

Out[38]: 1 ['a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ']

2. append(): The append() method adds a single item to the end of the list. Even

if the added item is a list, the whole list is added as a single item in the old list.

In [39]: 1 b_list . append (True)

2 b_list

Out[39]: 1 ['a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ',True]

XThis list has strings, integer, and boolean data.

In [40]: 1 len( b_list )

Out[40]: 1 9

In [41]: 1 b_list . append ([ 'd','e'])

2 b_list


Out[41]: 1 ['a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ',True ,[ 'd','

e'] ]


Out[42]: 1 10

XThe length of b_list has increased by only one even though two items,

['d', 'e'], were added.

3. extend(): Similar to append but each item is added separately. For e.g., let’s

consider the list

In [43]: 1 b_list =[ 'a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ',True

]

2 len( b_list )

Out[43]: 1 9

In [44]: 1 b_list . extend ([ 'd','e'])

2 b_list

Out[44]: 1 ['a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ',True ,'d','e

']


Out[45]: 1 11

XThe length of b_list has increased by two as two items in the list, ['d',

'e'], were added.

4. insert(): The insert() method inserts a single item into a list. The first argument

is the index of the first item in the list that will get bumped out of position.

In [46]: 1 b_list =[ 'a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ',

True]

2 b_list . insert (0,'d')

XInsert 'd' in the first position,i.e., index 0.

In [47]: 1 b_list


Out[47]: 1 ['d','a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ',True]

In [48]: 1 b_list . insert (0 ,[ 'x','y'])

In [49]: 1 b_list

Out[49]: 1 [[ 'x','y'],'d','a','b','mpilgrim ','z','example ', 2,'Hydro ','

Aqua ', True]

XThe list ['x', 'y'] is added as one item as in the case of append().

SEARCH FOR ITEM IN A LIST

Consider the following list:In [50]: 1 b_list =[ 'a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ','b']

• count() can be used as in the case of string

In [51]: 1 b_list . count ('b')

Out[51]: 1 2

• in can be used to check if certain value exists in a list.

In [52]: 1 'b'in b_list

Out[52]: 1 True

In [53]: 1 'c'in b_list

Out[53]: 1 False

XThe output is boolean data, i.e., True or False.

• index can be used to find the index of search data.

In [54]: 1 b_list . index ('a')

Out[54]: 1 0

In [55]: 1 b_list . index ('b')

Out[55]: 1 1

XEven though there are 2 'b', the index of first 'b' is returned.


REMOVING ITEM FROM A LIST

There are many ways to remove an item from a list. The list automatically adjusts its

size after some element has been removed.

REMOVING ITEM BY INDEX

The del command removes an item from a list if the index of an element that needs

to be removed is provided.

• Consider the following list:

In [56]: 1 b_list =[ 'a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ','b']

• Suppose we want to remove the element 'mpilgrim' from the list. Its index is 2.

In [57]: 1 b_list [2]

Out[57]: 1 'mpilgrim '

In [58]: 1 del b_list [2]

2 b_list

Out[58]: 1 ['a','b','z','example ', 2,'Hydro ','Aqua ','b']

X'mpilgrim' is now removed.

The pop() command can also remove an item by specifying an index. But, it is even

more versatile as it can be used without any argument to remove the last item of a

list.



• Suppose we want to remove the element 'mpilgrim' from the list. Its index is 2.

In [60]: 1 b_list [2]


In [61]: 1 b_list .pop (2)



XThe item to be removed will be displayed.

• Now the b_list is as follows

Out[61]: 1 ['a','b','z','example ', 2,'Hydro ','Aqua ','b']

• If pop() is used without an argument.

In [62]: 1 b_list .pop ()

• Now the b_list is as follows

Out[62]: 1 ['a','b','z','example ', 2,'Hydro ','Aqua ']

XThe last 'b' is removed from the list.

• If pop() is used once again. The list will be as follows:

Out[62]: 1 ['a','b','z','example ', 2,'Hydro ']

REMOVING ITEM BY VALUE

The remove command removes item/items from a list if the value of the item is

specified.



• Suppose we want to remove the elements 'b' from the list.

In [64]: 1 b_list . remove ('b')

In [65]: 1 b_list

Out[65]: 1 ['a','z','example ', 2,'Hydro ','Aqua ']

XAll the 'b' in the list are now removed.


2.2.2. TUPLES

A tuple is an immutable list. A tuple can not be changed/modified in any way once it

is created.

• A tuple is defined in the same way as a list, except that the whole set of elements

is enclosed in parentheses instead of square brackets.

• The elements of a tuple have a defined order, just like a list. Tuples indices are

zero based, just like a list, so the first element of a non empty tuple is always

t[0].

• Negative indices count from the end of the tuple, just as with a list.

• Slicing works too, just like a list. Note that when you slice a list, you get a new

list; when you slice a tuple, you get a new tuple.

• A tuple is used because reading/writing a tuple is faster than the same for lists.

If you do not need to modify a set of item, a tuple can be used instead of list.

CREATING TUPLES

A tuple can be created just like the list but parentheses “( )” has to be used instead

of square brackets“[ ]”. For e.g.,In [66]: 1 a_tuple =( 'a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ','b')

TUPLE OPERATIONS

All the list operations except the ones that modify the list itself can be used for tuples

too. For e.g., you cannot use append(), extend(), insert(), del, remove(), and pop() for

tuples. For other operations, please follow the same steps as explained in the previous

section. Here are some examples of tuple operations.

• Consider the following tuple:

In [67]: 1 a_tuple =( 'a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ','b'

)

In [68]: 1 a_tuple . index ('z')

Out[68]: 1 3


Xitem 'z' is at the index 3, i.e., it is the fourth element of the tuple.

In [69]: 1 b_tuple = a_tuple [0:4]

In [70]: 1 b_tuple

Out[70]: 1 ('a','b','mpilgrim ','z')

XNew tuple can be created by slicing a tuple as original tuple does not

change.

In [71]: 1 a_tuple

Out[71]: 1 ('a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ','b')

2.2.3. SETS

A set is an unordered collection of unique values. A single set can contain values of

any datatype.

CREATING SET

There are basically two ways of creating set.

1. From scratch: Sets can be created like lists but curly brackets “{}” have to be

used instead of square brackets “[ ]”. For e.g.,

In [72]: 1 a_set ={ 'a','b','mpilgrim ','z','example ', 2,'Hydro ','Aqua ','b'}

In [73]: 1 type( a_set )

Out[73]: 1 set

In [74]: 1 a_set

Out[74]: 1 {2, 'Aqua ', 'Hydro ', 'a', 'b', 'example ', 'mpilgrim ', 'z'}

XThe set has different orders than the values given inside {} because it is

unordered and original orders are ignored. Also, there is only one 'b' in the set

even though two 'b' were given because a set is a collection of unique values.

Duplicate values are taken as one.


2. From list or tuple: A set can be created from a list or tuple as,

In [75]: 1 set( a_list )

2 set( a_tuple )

MODIFYING SET

A set can be modified by adding an item or another set to it. Also, items of set can

be removed.

ADDING ELEMENTS

• Consider a set as follows,

In [76]: 1 a_set ={2 , 'Aqua ','Hydro ','a','b','example ','mpilgrim ','z'}

• Using add: To add single item to a set.

In [77]: 1 a_set .add('c')

In [78]: 1 a_set

Out[78]: 1 {2,'Aqua ','Hydro ','a','b','c','example ','mpilgrim ','z'}

X'c' is added after 'b'.

• Using update: To add multiple items as a set or list or tuple.

In [79]: 1 a_set . update ('a','Sujan ','Koirala ')

In [80]: 1 a_set

Out[80]: 1 {2,'Aqua ','Hydro ','Koirala ','Sujan ','a','b','c','example ','

mpilgrim ','z'}

X'Koirala' and 'Sujan' are added but 'a' is not added.

REMOVING ELEMENTS

• Consider a set as follows,

In [81]: 1 a_set ={2 , 'Aqua ','Hydro ','a','b','example ','mpilgrim ','z'}


• Using remove() and discard(): These are used to remove an item from a set.

In [82]: 1 a_set . remove ('b')

In [83]: 1 a_set

Out[83]: 1 {2,'Aqua ','Hydro ','Koirala ','Sujan ','a','c','example ','mpilgrim

','z'}

X'b' has been removed.

In [84]: 1 a_set . discard ('Hydro ')

In [85]: 1 a_set

Out[85]: 1 {2,'Aqua ','Koirala ','Sujan ','a','c','example ','mpilgrim ','z'}

• Using pop() and clear(): pop() is same as list but it does not remove the last

item as list. pop() removes one item ramdomly. clear() is used to clear the whole

set and create an empty set.

In [86]: 1 a_set .pop ()

In [87]: 1 a_set

Out[87]: 1 {2,'Koirala ','Sujan ','a','c','example ','mpilgrim ','z'}

SET OPERATIONS

Two sets can be combined or common elements in two sets can be combined to form

a new set. These functions are useful to combine two or more lists.

• Consider following two sets,

In [88]: 1 a_set ={2 ,4 ,5 ,9 ,12 ,21 ,30 ,51 ,76 ,127 ,195}

2 b_set ={1 ,2 ,3 ,5 ,6 ,8 ,9 ,12 ,15 ,17 ,18 ,21}

• Union: Can be used to combine two sets.

In [89]: 1 c_set = a_set . union ( b_set )

In [90]: 1 c_set

Out[90]: 1 {1 ,2 ,195 ,4 ,5 ,6 ,8 ,12 ,76 ,15 ,17 ,18 ,3 ,21 ,30 ,51 ,9 ,127}


• Intersection: Can be used to create a set with elements common to two sets.

In [91]: 1 d_set = a_set . intersection ( b_set )

In [92]: 1 d_set

Out[92]: 1 {9 ,2 ,12 ,5 ,21}

2.2.4. DICTIONARIES

A dictionary is an unordered set of key-value pairs. A value can be retrieved for a

known key but the other-way is not possible.

CREATING DICTIONARY

Creating a dictionary is similar to set in using curled brackets “{}” but key:value pairs

are used instead of values. The following is an example,In [93]: 1 a_dict ={ 'Hydro ':'131.112.42.40 ','Aqua ':'131.112.42.41 '}

In [94]: 1 a_dict

Out[94]: 1 {'Aqua ':'192.168.1.154 ','Hydro ':'131.112.42.40 '

XThe order is changed automatically like set.In [95]: 1 a_dict ['Hydro ']

Out[95]: 1 '131.112.42.40 '

XKey 'Hydro' can be used to access the value '131.112.42.40'.

MODIFYING DICTIONARY

Since the size of the dictionary is not fixed, new key:value pair can be freely added to

the dictionary. Also values for a key can be modified.

• Consider the following dictionary.

In [96]: 1 a_dict ={ 'Aqua ':'192.168.1.154 ','Hydro ':'131.112.42.40 '}

• If you want to change the value of 'Aqua',

In [97]: 1 a_dict ['Aqua ']= '192.168.1.154 '


In [98]: 1 a_dict

Out[98]: 1 {'Aqua ':'192.168.1.154 ','Hydro ':'131.112.42.40 '

• If you want to add new item to the dictionary,

In [99]: 1 a_dict ['Lab ']= 'Kanae '

In [100]: 1 a_dict

Out[100]: 1 {'Aqua ':'192.168.1.154 ','Hydro ':'131.112.42.40 ','Lab ':'Kanae '}

• Dictionary values can also be lists instead of single values. For e.g.,

In [101]: 1 k_lab ={ 'Female ':[ 'Yoshikawa ','Imada ','Yamada ','Sasaki ','

Watanabe ','Sato '],'Male ':[ 'Sujan ','Iseri ','Hagiwara ','

Shiraha ','Ishida ','Kusuhara ','Hirochi ','Endo ']}

In [102]: 1 k_lab ['Female ']

Out[102]: 1 ['Yoshikawa ','Imada ','Yamada ','Sasaki ','Watanabe ','Sato ']

2.2.5. ARRAYS

Arrays are similar to lists but it contains homogeneous data, i.e., data of same type

only. Arrays are commonly used to store numbers and hence used in mathematical

calculations.

CREATING ARRAYS

Python arrays can be created in many ways. It can also be read from some data file in

text or binary format, which are explained in latter chapters of this guide. Here, some

commonly used methods are explained. For a detailed tutorial on python arrays, refer

here.

1. From list: Arrays can be created from a list or a tuple using:

Xsomearray=array(somelist). Consider the following examples.

In [103]: 1 b_list =[ 'a','b' ,1,2]

http://www.scipy.org/Tentative_NumPy_Tutorial#head-d3f8e5fe9b903f3c3b2a5c0dfceb60d71602cf93


XThe list has mixed datatypes. First two items are strings and last two are

numbers.

In [104]: 1 b_array = array ( b_list )

1 array ([ 'a','b','1','2'], dtype ='|S8 ')

XSince first two elements are string, numbers are also converted to strings

when array is created.

In [105]: 1 b_list2 =[1 ,2 ,3 ,4]

XAll items are numbers.

In [106]: 1 b_array2 = array ( b_list2 )

In [107]: 1 b_array2

Out[107]: 1 array ([1 , 2, 3, 4])

XNumeric array is created. Mathematical operations like addition, subtrac-

tion, division, etc. can be carried in this array.

2. Using built-in functions:

(a) From direct values:

In [108]: 1 xx = array ([2 , 4, -11])

Xxx is array of length 3 or shape (1,3) ⇒ means 1 row and 3 columns.

(b) From arange(number): Creates an array from the range of values. Ex-

amples are provided below. For details of arange follow chapter 4.

In [109]: 1 yy= arange (2 ,5 ,1)

In [110]: 1 yy

Out[110]: 1 array ([2 ,3 ,4])

XCreates an array from lower value (2) to upper value (5) in specified

interval (1) excluding the last value (5).

In [111]: 1 yy= arange (5)


In [112]: 1 yy

Out[112]: 1 array ([0 ,1 ,2 ,3 ,4])

XIf the lower value and interval are not specified, they are taken as 0

and 1, respectively.

In [113]: 1 yy= arange (5 ,2 , -1)

In [114]: 1 yy

Out[114]: 1 array ([5 ,4 ,3])

XThe interval can be negative.

(c) Arrays of fixed shape: Sometimes it is necessary to create some array to

store the result of calculation. Fuctions zeros(someshape) and ones(someshape)

can be used to create arrays with all values as zero or one, respectively.

In [115]: 1 zz= zeros (20)

Xwill create an array with 20 zeros.

In [116]: 1 zz= zeros (20 ,20)

Xwill create an array with 20 rows and 20 columns (total 20*20=400

elements) with all elements as zero.

In [117]: 1 zz= zeros (20 ,20 ,20)

Xwill create an array with 20 blocks with each block having 20 rows

and 20 columns (total 20*20*20=8000 elements) with all elements as zero.

ARRAY OPERATIONS

Arithmetic operators on arrays apply elementwise. A new array is created and filled

with the result.In [118]: 1 a = array ([20 ,30 ,40 ,50])

2 b = arange (4)

In [119]: 1 b

Out[119]: 1 array ([0 ,1 ,2 ,3])


In [120]: 1 c = a-b

In [121]: 1 c

Out[121]: 1 array ([20 , 29, 38, 47])

XEach element in b is subtracted from respective element in a.In [122]: 1 b**2

Out[122]: 1 array ([0 , 1, 4, 9])

XSquare of each element in b.In [123]: 1 10* sin(a)

Out[123]: 1 array ([ 9.12945251 , -9.88031624 , 7.4511316 , -2.62374854])

XTwo operations can be carried out at once.In [124]: 1 a <35

Out[124]: 1 array ([ True , True , False , False ], dtype =bool)

X'True' if a<35 otherwise 'False'.

3INPUT/OUTPUT OF FILES

Read and write data from/to files in commonly used data formats, such as

text (csv), binary, excel, netCDF, R data frame and Matlab.

30

3.1. Read Text File 31

This chapter explains the method to read and write data from/to commonly used

data formats, such as text (csv), binary, excel, netCDF, and Matlab.

3.1. READ TEXT FILE

Small datasets are often stored in a structured or unstructured text format. Python

libraries are able to read these data files in several ways.

3.1.1. PLAIN TEXT

First, we will load data from a free format structured text file (e.g., ASCII data).In [125]: 1 a= loadtxt ('example_plain .txt ', comments ='#', delimiter =None ,

converters =None , skiprows =0, usecols =None)

Reads the data in the text file as an ’array’. Will raise an error if the data is

non-numeric (float or integer).

3.1.2. COMMA SEPARATED TEXT

Often the data values in text file are separated by special characters like tab, line

breaks, or a comma. These separators can be excluded when reading the file by using

the option ’delimiter’ while using loadtxt.In [126]: 1 a= loadtxt ('example_csv .csv ', delimiter =',', converters ={0:

datestr2num })

A full list of options of loadtxt is available here.

3.1.3. UNSTRUCTURED TEXT

If the text data is unstructured, extra work is needed to read the file, and to process

it to be saved as an array.

• First the file has to be opened as an object :

In [127]: 1 a=file('filename ')

In [128]: 1 a=open('filename ')

In [129]: 1 type(a)

http://en.wikipedia.org/wiki/ASCII

https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

3.2. Save Text File 32

Out[129]: 1 file

• Extracting data from the file as a ’list’:

In [130]: 1 a_list =a. readlines ()

Xreadlines() reads contents (each line) of the file object ’a’ and puts it in a

a_list.

• Extracting data from the file as a ’string’:

In [131]: 1 a_str =a.read ()

Xread() reads contents of the file object ’a’ and stores it as a string.

ASCII files are coded with special characters. These characters need to be removed

from each line/item of the data using read or readlines.

• Drop the ’\n’ or ’\r \n’ sign at the end of each line:

• strip() is used to remove these characters:

• To drop it from each element of a_list:

In [132]: 1 b=[s. strip () for s in a]

• Furthermore, to convert each element into float:

In [133]: 1 b=[ float (s. strip ()) for s in a]

3.2. SAVE TEXT FILE

• To save an array ’a’,

In [134]: 1 savetxt (filename , a, fmt='%.18e', delimiter =' ', newline ='\n',

header ='', footer ='', comments ='# ')

A full list of options of savetxt is available here.

https://docs.python.org/2.0/ref/strings.html

https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html

3.3. Read Binary Data 33

Table 3.1: Data type of the returned array

Type code C Type Python Type Minimum size in bytes

'c' char character 1

'b' signed char int 1

'B' unsigned char int 1

'u' Py_UNICODE Unicode character 2

'h' signed short int 2

'H' unsigned short int 2

'i' signed int int 2

'I' unsigned int long 2

'l' signed long int 4

'L' unsigned long long 4

'f' float float 4

'd' double float 8

ãã

3.3. READ BINARY DATA

Binary data format is used because it uses smaller number of bytes to store each data,

such that its efficient in using smaller memory. This section explains the procedure of

reading and writing data in binary format using built-in function, fromfile.In [135]: 1 dat= fromfile ('filename ,'type code ')

• filename is the name of the file.

• type code: can be defined as type code (e.g., 'f') or python type (e.g., 'float') as

shown in Table 3.1. It determines the size and byte-order of items in the binary

file.

In [136]: 1 dat= fromfile ('example_binary . float32 ','f')

2 dat

3.4. Write Binary Data 34

3.4. WRITE BINARY DATA

• To write/save all items (as machine values) of an array "A" to a file:

In [137]: 1 A. tofile ('filename ')

• can also include the data type as,

In [138]: 1 A. astype ('f'). tofile ('filename ')

3.5. READ NETCDF DATA

NetCDF data files can be read by several packages such as Scientific, Scipy, and

NetCDF4. Below is an example of reading netCDF file using io module of Scipy.In [139]: 1 from scipy .io import netcdf

2 ncf= netcdf . netcdf_file ('example_netCDF .nc ')

3 ncfile . variables

4 dat=ncf. variables ['wbal_clim_CUM '][:]

3.6. WRITE NETCDF DATA

A short example of how to create netCDF data is below. For details, refer to the

original Scipy help page.In [140]: 1 from scipy .io import netcdf

2 f = netcdf . netcdf_file ('simple .nc ', 'w')

3 f. history = 'Created for a test '

4 f. createDimension ('time ', 10)

5 time = f. createVariable ('time ', 'i', ('time ' ,))

6 time [:] = np. arange (10)

7 time. units = 'days since 2008 -01 -01 '

8 f. close ()

3.7. READ MATLAB DATA

MatLab data files can be read by using python interface for hdf5 dataset. Requires

installation of h5py package.In [141]: 1 a=h5py.File('example_matlab .mat ')

In [142]: 1 a.keys ()

https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.io.netcdf.netcdf_file.html

http://www.h5py.org

3.8. Read Excel Data 35

Out[142]: 1 [u'#refs#', u'Results ']

In [143]: 1 a['Results ']. keys ()

Out[143]: 1 [u'SimpBM ', u'SimpBM2L ', u'SimpBMtH ', u'SimpGWoneTfC ', u'SimpGWvD ']

In [144]: 1 dat=a['Results / SimpGWvD / Default / ModelOutput / actET '][:]

2 dat=a['Results '][ 'SimpGWvD '][ 'Default '][ 'ModelOutput '][ 'actET '][:]

3.8. READ EXCEL DATA

Excel workbooks created by MS Office 2010 or later (.xlsx) file can be read using

openpyxl package.In [145]: 1 ex_f= load_workbook ('example_xls .xlsx ')

2 ex_f. sheetnames

3 a_sheet =ex_f['Belleville_96 -pr ']

https://openpyxl.readthedocs.io/en/default/

4DATA OPERATIONS IN PYTHON

Information on common mathematical and simple statistical operation on

data.

36

4.1. Size and Shape 37

4.1. SIZE AND SHAPE

All data are characterized by two things: how big they are (size), and how they are

arranged (shape). Here are some useful commands to play with the size and shape of

data.

We will use the following list as an example:In [146]: 1 a=[1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10]

• Check the dimension of the data:

In [147]: 1 size(a)

Out[147]: 1 10

• Check the shape of the data:

In [148]: 1 shape (a)

Out[148]: 1 (10 ,)

Xfor list.

In [149]: 1 array (a). shape

Out[149]: 1 (10 ,)

Xfor array.

XNote that the order is number of rows (longitudinal direction↓), number

of columns (lateral direction→) for 2-dimensional arrays in python.

• Change the arrangement of the data:

In [150]: 1 b=a. reshape (2 ,5)

Xcan be used in arrays only.

In [151]: 1 b= array ([[1 ,2 ,3 ,4 ,5] ,[6 ,7 ,8 ,9 ,10]])

In [152]: 1 b= reshape (a ,(2 ,5))

4.2. Slicing and Dicing 38

Xcan be used for both array and list. List is converted to array by using this

function.

In [153]: 1 b=a. reshape ( -1 ,5)

XBy using the ‘-1’ flag, the first dimension is automatically set to match the

total size of the array. For e.g., if there are 10 elements in an array/list and 5

columns is specified during reshape, number of rows is automatically calculated

as 2. The shape will be (2,5).

• Convert the data type to array:

In [154]: 1 b= array (a)

• Convert the data type to list:

In [155]: 1 b=a. tolist ()

• Convert the data into float and integer:

In [156]: 1 float (a[0])

2 int(a[0])

Xthese functions can only be used for one element at a time.

4.2. SLICING AND DICING

This section explains how to extract data from an array or a list. The following process

can be used to take data for a region from global data, or for a limited period from

long time series data. The process is called ‘slicing’.

As same method can be used for arrays and lists. Let’s consider the following list,In [157]: 1 a=[1 ,2 ,3 ,4 ,5]

XThere are five items in the list.

INDEX BASICS

Indexing is done in two ways:

1. Positive Index: The counting order is from left to right. The index for the first

element is 0 (not 1).


In [158]: 1 a[0]

Out[158]: 1 1

In [159]: 1 a[1]

Out[159]: 1 2

In [160]: 1 a[4]

Out[160]: 1 5

XThe fifth item (index=4) is 5.

2. Negative Index: The counting order is from right to left. The index for the last

item is -1. In some cases, the list is very long and it is much easier to count

from the end rather than the beginning.

In [161]: 1 a[ -1]

Out[161]: 1 5

XIt is same as a[4] as shown above.

In [162]: 1 a[ -2]

Out[162]: 1 4

DATA EXTRACTION

Data extraction is carried out by using indices. In this section, some examples of using

indices are provided. Details of array indexing and slicing can be found here.

1. Using two indices:

In [163]: 1 somelist [ first index :last index :( interval )]

In [164]: 1 a [0:2]

Out[164]: 1 [1 ,2]

Xa[0] and a[1] are included but a[2] is not included.

https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html


In [165]: 1 a [3:4]

Out[165]: 1 4

2. Using single index:

In [166]: 1 a[:2]

Out[166]: 1 [1 ,2]

Xsame as a[0:2].

In [167]: 1 a[2:]

Out[167]: 1 [3 ,4 ,5]

Xsame as a[2:5].

3. Consider a 2-D list and 2-D array Different method for array and list as indexing

is different in two cases as explained below.

In [168]: 1 a_list =[[1 ,2 ,3] ,[4 ,5 ,6]]

2 a_array = array ([[1 ,2 ,3] ,[4 ,5 ,6]])

In [169]: 1 shape ( a_list )

Out[169]: 1 (2 ,3)

In [170]: 1 a_array . shape

Out[170]: 1 (2 ,3)

In [171]: 1 a_list [0]

Out[171]: 1 [1 ,2 ,3]

Xwhich is a list.

In [172]: 1 a_array [0]

Out[172]: 1 array ([1 ,2 ,3])

Xwhich is an array.


4. To extract data from list,

In [173]: 1 a_list [0][1]

Out[173]: 1 2

In [174]: 1 a_list [1][:2]

Out[174]: 1 [4 ,5]

XThe index has to be provided in two different sets of square brackets “[ ]”.

5. To extract data from array,

In [175]: 1 a_array [0 ,1]

Out[175]: 1 2

In [176]: 1 a_array [1 ,:2]

Out[176]: 1 [4 ,5]

XThe index can be provided is one set of square brackets “[ ]”.

6. Consider a 3-D list and 3-D array,

In [177]: 1 a_list =[[[2 ,3] ,[4 ,5] ,[6 ,7] ,[8 ,9]] ,[[12 ,

13] ,[14 ,15] ,[16 ,17] ,[18 ,19]]]

2 a_array = array ([[[2 ,3] ,[4 ,5] ,[6 ,7] ,[8 ,9]] ,[[12 ,

13] ,[14 ,15] ,[16 ,17] ,[18 ,19]]])

XThe shape of both data is (2,4,2).

To extract from list,

In [178]: 1 a_list [0][2]

Out6,7:

In [179]: 1 a_list [0][2][1]

Out[179]: 1 6

To extract from array,

4.3. Built-in Mathematical Functions 42

In [180]: 1 a_array [0 ,2]

Out[180]: 1 array ([6 ,7])

In [181]: 1 a_array [0 ,2 ,1]

Out[181]: 1 6

4.3. BUILT-IN MATHEMATICAL FUNCTIONS

The Python interpreter has a number of functions built into it. This section documents

the Pythonâs built-in functions in easy-to-use order. Firstly, consider the following 2-D

arrays,In [182]: 1 A= array ([[ -2 , 2], [-5, 5]])

2 B= array ([[2 , 2], [5, 5]])

3 C= array ([[2.53 , 2.5556] , [5.3678 , 5.4568]])

1. max(iterable): Returns the maximum from the passed elements or if a single

iterable is passed, the max element in the iterable. With two or more arguments,

return the largest value.

In [183]: 1 max ([0 ,10 ,15 ,30 ,100 , -5])

Out[183]: 1 100

In [184]: 1 A.max ()

Out[184]: 1 5

2. min(iterable): Returns the minimum from the passed elements or if a single

iterable is passed, the minimum element in the iterable. With two or more

arguments, return the smallest value.

In [185]: 1 min ([0 ,10 ,15 ,30 ,100 , -5])

Out[185]: 1 -5

In [186]: 1 A.min ()

Out[186]: 1 -5


3. mean(iterable): Returns the average of the array elements. The average is taken

over the flattened array by default, otherwise over the specified axis. For details,

click here.

In [187]: 1 mean ([0 ,10 ,15 ,30 ,100 , -5])

Out[187]: 1 75

In [188]: 1 A.mean ()

Out[188]: 1 0.0

4. median(iterable): Returns the median of the array elements.

In [189]: 1 median ([0 ,10 ,15 ,30 ,100 , -5])

Out[189]: 1 12.5

In [190]: 1 A. median ()

Out[190]: 1 0.0

5. sum(iterable): Returns the sum of the array elements. It returns sum of array

elements over an axis if axis is specified else sum of all elements. For details,

click here.

In [191]: 1 sum ([1 ,2 ,3 ,4])

Out[191]: 1 10

In [192]: 1 A.sum ()

Out[192]: 1 0

6. abs(A): Returns the absolute value of a number, which can be an integer or a

float, or an entire array.

In [193]: 1 abs(A)

Out[193]: 1 array ([[2 ,2] ,[5 ,5]])

http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html

http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html


In [194]: 1 abs(B)

Out[194]: 1 array ([2 , 2] ,[5 , 5])

7. divmod(x,y): Returns the quotient and remainder resulting from dividing the

first argument (some number x or an array) by the second (some number y or

an array).

In [195]: 1 divmod (2, 3)

Out[195]: 1 (0, 2)

Xas 2 / 3 = 0 and remainder is 2.

In [196]: 1 divmod (4, 2)

Out[196]: 1 (2, 0)

Xas 4 / 2 = 2 and remainder is 0.

In case of two dimensional array data

In [197]: 1 divmod (A,B)

Out[197]: 1 ( array ([[ -1 , 1], [-1, 1]]) , array ([[0 , 0], [0, 0]]))

8. modulo (x%y): Returns the remainder of a division of x by y.

In [198]: 1 5%2

Out[198]: 1 1

9. pow(x,y[, z]): Returns x to the power y. But, if z is present, returns x to

the power y modulo z (more efficient than pow(x, y) % z). The pow(x, y) is

equivalent to x**y.

In [199]: 1 pow(A,B)

Out[199]: 1 array ([[4 , 4], [ -3125 , 3125]])


10. round(x,n): Returns the floating point value of x rounded to n digits after the

decimal point.

In [200]: 1 round (2.675 ,2)

Out[200]: 1 2.67

11. around(A,n): Returns the floating point array A rounded to n digits after the

decimal point.

In [201]: 1 around (C ,2)

Out[201]: 1 array ([[ 2.53 , 2.56] , [ 5.37 , 5.46]])

12. range([x],y[,z]) : This function creates lists of integers in an arithmetic progres-

sion. It is primarily used in for loops. The arguments must be plain integers.

• If the step argument is omitted, it defaults to 1.

• If the start argument (x) is omitted, it defaults to 0.

• The full form returns a list of plain integers [x, x + z, x + 2*z, · · · ,y-z].

• If step (z) is positive, the last element is the ‘start (x) + i * step (z)’ just

less than ‘y’.

• If step (z) is negative, the last element is the ‘start (x) + i * step (z)’ just

greater than ‘y’.

• If step (z) is zero, ValueError is raised.

In [202]: 1 range (10)

Out[202]: 1 [0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9]

In [203]: 1 range (1 ,11)

Out[203]: 1 [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10]

In [204]: 1 range (0 ,20 ,5)

Out[204]: 1 [0 ,5 ,10 ,15]

http://wiki.python.org/moin/ForLoop


In [205]: 1 range (0,-5,-1)

Out[205]: 1 [0,-1,-2,-3,-4]

In [206]: 1 range (0)

Out[206]: 1 [ ]

13. arange(x,y[,z]) : This function creates arrays of integers in an arithmetic pro-

gression. Same as in range().

In [207]: 1 arange (10)

Out[207]: 1 array ([0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9])

In [208]: 1 arange (1 ,11)

Out[208]: 1 array ([1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10])

In [209]: 1 arange (0 ,20 ,5)

Out[209]: 1 array ([0 ,5 ,10 ,15])

In [210]: 1 arange (0,-5,-1)

Out[210]: 1 array ([0,-1,-2,-3,-4])

In [211]: 1 arange (0)

Out[211]: 1 array ([ ], dtype = int64 )

14. zip(A,B): Returns a list of tuples, where each tuple contains a pair of ith element

of each argument sequences. The returned list is truncated to length of shortest

sequence. For a single sequence argument, it returns a list with 1 tuple. With

no arguments, it returns an empty list.

In [212]: 1 zip(A,B)

Out[212]: 1 [( array ([-2, 2]) , array ([2 , 2])), ( array ([-5, 5]) , array ([5 ,

5]))]

4.4. Matrix operations 47

15. sort(): Sorts the array elements in smallest to largest order.

In [213]: 1 D= array ([10 ,2 ,3 ,10 ,100 ,54])

In [214]: 1 D.sort ()

In [215]: 1 D

Out[215]: 1 array ([2 , 3, 10, 10, 54, 100])

16. ravel(): Returns a flattened array. 2-D array is converted to 1-D array.

In [216]: 1 A. ravel ()

Out[216]: 1 array ([-2, 2, -5, 5])

17. transpose(): Returns the transpose of an array (matrix) by permuting the di-

mensions.

In [217]: 1 A. transpose ()

Out[217]: 1 array ([[ -2 , -5], [ 2, 5]])

18. diagonal(): Returns diagonal matrixs for pecified diagonals.

In [218]: 1 A. diagonal ()

Out[218]: 1 array ([-2, 5])

4.4. MATRIX OPERATIONS

The linear algebra module of Numpy provides a suit of Matrix calculations.

1. Dot product:

In [219]: 1 a=rand (3 ,3)

2 b=rand (3 ,3)

3 dot_p =dot(a,b)

Xwhere a and b are two arrays.

https://docs.scipy.org/doc/numpy/reference/routines.linalg.html

4.5. String Operations 48

2. Cross product:

In [220]: 1 a=rand (3 ,3)

2 b=rand (3 ,3)

3 cro_p = cross (a,b)

Xwhere a and b are two arrays.

3. Matrix multiplication:

In [221]: 1 a=rand (2 ,3)

2 b=rand (3 ,2)

3 mult_ab = matmul (a,b)

In [222]: 1 shape ( mult_ab )

Out[222]: 1 (2 ,2)

4.5. STRING OPERATIONS

Lets assume a string s as,In [223]: 1 s='sujan koirala '

1. split(): Splitting the strings. It has one required argument, a delimiter. The

method splits a string into a list of strings based on the delimiter.

In [224]: 1 s. split ()

Out[224]: 1 ['sujan ', 'koirala ']

Xblank space as delimiter. creates a list with elements separated at locations

of blank space.

In [225]: 1 s. split ('a')

Out[225]: 1 ['suj ', 'n koir ', 'l', '']

X’a’ as delimiter. creates a list with elements separated at locations of ’a’

2. lower() and upper(): Changes the string to lower case and upper case respec-

tively.

4.6. Other Useful Functions 49

In [226]: 1 s='Sujan Koirala '

2 s

Out[226]: 1 'Sujan Koirala '

In [227]: 1 s. lower ()

Out[227]: 1 'sujan koirala '

In [228]: 1 s. upper ()

Out[228]: 1 'SUJAN KOIRALA '

3. count(): Counts the number of occurrences of a substring.

In [229]: 1 s. count ('a')

Out[229]: 1 3

XThere are 3 a’s in string s.

4. Replace a substring:

In [230]: 1 s2=s. replace ("Su", "Tsu")

5. List to String:

In [231]: 1 a_list =[ 'a','b','c']

2 a_str =" and ".join(str(x) for x in a_list )

3 a_str

Out[231]: 1 'a and b and c'

4.6. OTHER USEFUL FUNCTIONS

1. astype('type code'): Returns an array with the same elements coerced to the

type indicated by type code in Table 3.1. It is useful to save data as some type.

In [232]: 1 A. astype ('f')

Out[232]: 1 array ([[ -2. , 2.] ,[ -5. , 5.]])

4.6. Other Useful Functions 50

2. tolist(): Converts the array to an ordinary list with the same items.

In [233]: 1 A. tolist ()

Out[233]: 1 [[-2, 2], [-5, 5]]

3. byteswap(): Swaps the bytes in an array and returns the byteswapped array. If

the first argument is 'True', it byteswaps and returns all items of the array in-

place. Supported byte sizes are 1, 2, 4, or 8. It is useful when reading data from

a file written on a machine with a different byte order. For details on machine

dependency, refer this. To convert data from big endian to little endian or vice-

versa, add byteswap() in same line where ‘fromfile’ is used. If your data is made

by big endian.

http://en.wikipedia.org/wiki/Endianness

5ESSENTIAL PYTHON SCRIPTING

Control statements, structure of a Python program, and system commands.

51

5.1. Control Flow Tools 52

5.1. CONTROL FLOW TOOLS

This section briefly introduces common control statements in Python. The control

statements are written in the block structure and do not have end statement. The end

of a block are expressed by indentation.

5.1.1. IF STATEMENT

The if statement is used to test a condition, which can have True of False values. An

example if block is:In [234]: 1 if x < 0:

2 print x,'is a negative number '

3 elif x > 0:

4 print x,'is a negative number '

5 else:

6 print , 'x is zero '

Xcan have zero or more elif, and else statement is also optional.

If statement can also be checked if a value exists within an iterable such as list,

tuple, array or a string.In [235]: 1 a_list =[ 'a','d','v' ,2,4]

2 if 'd' in a_list :

3 print a_list . index ('d')

Out[235]: 1 1

In [236]: 1 str='We are in a python Course '

2 if 'We ' in str:

3 print str

Out[236]: 1 We are in a Python Course

5.1.2. FOR STATEMENT

The for statement iterates over the items of any sequence (a list or a string), and

repeats the steps within the for loop.In [237]: 1 words = ['cat ', 'window ', 'defenestrate ']

2 for _wor in words :

3 print _wor , words . index (_wor),len(_wor)

Dictionary can be iterated using items, keys, or values.

5.1. Control Flow Tools 53

In [238]: 1 words = {1: 'cat ' ,2: 'window ' ,3: 'defenestrate '}

2 for _wor in words . items ():

3 print _wor ,len(_wor)

Out[238]: 1 (1, 'cat ') 2

2 (2, 'window ') 2

3 (3, 'defenestrate ') 2

5.1.3. WHILE STATEMENT

Similar to if statement, but it does not repeat until the end of the loop. The while

loop ends when a condition is met.In [239]: 1 count = 0

2 while ( count < 2):

3 print 'The count is:', count

4 count = count + 1

5 print "Good bye!"

Out[239]: 1 The count is: 0

2 The count is: 1

3 Good bye!

5.1.4. BREAK AND CONTINUE

The break statement breaks out of the smallest enclosing for or while loop. The

continue statement continues with the next iteration of the same loop.In [240]: 1 for n in range (2, 10):

2 for x in range (2, n):

3 if n % x == 0:

4 print n, 'equals ', x, '*', n/x

5 break

6 else:

7 # loop fell through without finding a factor

8 print n, 'is a prime number '

Out[240]: 1 2 is a prime number

2 3 is a prime number

3 4 equals 2 * 2


5 6 equals 2 * 3


7 8 equals 2 * 4

8 9 equals 3 * 3

5.2. Python Functions 54

In [241]: 1 for num in range (2, 10):

2 if num % 2 == 0:

3 print " Found an even number ", num

4 continue

5 print " Found a number ", num

Out[241]: 1 Found an even number 2

2 Found a number 3

3 Found an even number 4

4 Found a number 5


6 Found a number 7


8 Found a number 9

5.1.5. RANGE

As shown in previous chapters and examples, range is used to generate a list of numbers

from start to end at an interval step. In Python 2, range generates the whole list object,

whereas in Python 3, it is a special range generator object that does not use the memory

redundantly.

5.2. PYTHON FUNCTIONS

Sections of a Python program can be put into an indented block (a container of code)

defined as a function. A function can be called "on demand". A basic syntax is as

follows:In [242]: 1 def funcname (param1 , param2 ):

2 prod= param1 * param2

3 print prod

4 return

In [243]: 1 type( funcname (2 ,3))

Out[243]: 1 6

2 NoneType

• def identifies a function. A name follows the def.

• Parameters can be passed to a function. These parameter values are substituted

before calculation.

5.3. Python Modules 55

• return: Returns the result of the function. In the above example, return will be

an empty NoneType object.

If the return command includes arguments, the result can be passed onto the

statement that calls the function. Also, the default values of the parameters can also

be set. If the function is called without any arguments, the default values are used for

calculation. Here is an example.In [244]: 1 def funcname ( param1 =2, param2 =3):

2 prod= param1 * param2

3 return prod

In [245]: 1 funcname ()

Out[245]: 1 6

In [246]: 1 funcname (3 ,4)

Out[246]: 1 12

In [247]: 1 type( funcname (2 ,3))

Out[247]: 1 int

For details on defining function, refer here.

5.3. PYTHON MODULES

A module is a file containing Python definitions and statements. A module file (with

.py ending) provides a module named filename. This module name is available as

__name__ once the module is imported.

https://docs.python.org/release/1.5.1p1/tut/defining.html

5.3. Python Modules 56

In [248]: 1 #!/ Users / skoirala / anaconda /envs/ pyfull /bin/ python

2 import numpy as np

3 def lcommon (m,n): #def function 's name with parameters as ,

funcname ( parameter )

4 if m > 0 and n >0 and type(m) == int and type(n) == int: #int

means integer

5 a=[] # an empty list

6 for i in range (1,n+1): #i is from 1 to n_th +1 ;that is [1,

2, ... , (n -1) +1]

7 M=m*i

8 for k in range (1,m+1):

9 N=n*k

10 if M == N: #M, N is common multiple of m and n

11 a=np. append (a,M) # input the common multi - ple of m and n

into list a

12 return (min(a)) # return the minimum value in list a

13 else:

14 return (" error ")

15 def computeHCF (x, y):

16

17 # choose the smaller number

18 if x > y:

19 smaller = y

20 else:

21 smaller = x

22 for i in range (1, smaller +1):

23 if ((x % i == 0) and (y % i == 0)):

24 hcf = i

25

26 return hcf

• After a file which includes a function is created and saved, the function can be

used in interactive shell within the directory (with the file) or in other files in the

same directory as a module.

XIf the saved filename is samp_func.py, the function can be called in from

another program file in the same directory.

In [249]: 1 import samp_func as sf

2 print sf. lcommon (3 ,29)

Out[249]: 1 87.0

XIf the program is run, you can get the number 87, which is the least

common multiple of 3 and 29.

5.4. Python Classes 57

The module file can also be run as an standalone program if the following block of

code is added to the end.In [250]: 1 if __name__ == " __main__ ":

2 import sys

3 lcommon (int(sys.argv [1]) ,int(sys.argv [2]))

4 computeHCF (int(sys.argv [1]) ,int(sys.argv [2]))

Also, the variables defined in the module file can be accessed as long as it is not

within functions of the module.In [251]: 1 somevariable =[ '1' ,2,4]

In [252]: 1 import samp_func as sf

2 print sf. somevariable

Out[252]: 1 ['1' ,2,4]

A list of all the objects from the module can be obtained by using dir() asIn [253]: 1 dir( samp_func )

5.4. PYTHON CLASSES

As Python is a fully object oriented language, it provides class that allows you to create

(instantiate) an object. A class is something that just contains structure – it defines

how something should be laid out or structured, but doesn’t actually fill in the content.

This is useful when a set of operation is to be carried out in several instances, and

provides a distinction for every object created. The following is an example class taken

from here.In [254]: 1 import math

2 class Point :

3 def __init__ (self , x, y):

4 self.x = x

5 self.y = y

6

7 def __str__ (self):

8 return " Point (%d, %d)" % (self.x, self.y)

9

10 def distance_from_origin (self):

11 return math.sqrt(self.x**2 + self.y**2)

• It is customary to name class using upper case letters.

http://www-rohan.sdsu.edu/~gawron/python_for_ss/course_core/book_draft/anatomy/classes.html

5.5. Additional Relevant Modules 58

• __init__ and self are critical to create an object.

• self is the object that will be created when the class is called

• __init__ creates the object self and assigns the attributes x and y to it.

In [255]: 1 p1= Point (1 ,4)

2 p2= Point (2 ,3)

3 p1.x

Out[255]: 1 1

In [256]: 1 p1. distance_from_origin ()

Out[256]: 1 4.123105625617661

In [257]: 1 p2. distance_from_origin ()

Out[257]: 1 3.605551275463989

Some simple and easy to understand examples of class are provided in:

• http://www.jesshamrick.com/2011/05/18/an-introduction-to-classes-and-inheritance-

in-python/

• https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-

object-oriented-programming/

5.5. ADDITIONAL RELEVANT MODULES

This section briefly introduces modules that are necessary while executing a Python

program in the most efficient way.

5.5.1. SYS MODULE

This module provides access to some variables used or maintained by the Python

interpreter.

• argv: Probably, the most useful of sys methods. sys.argv is a list object contain-

ing the arguments while running a python script from command lines. The first

element is always the program name. Any number of arguments can be passed

into the program as strings, e.g., sys.argv[1] is the second argument and so on.

http://www.jesshamrick.com/2011/05/18/an-introduction-to-classes-and-inheritance-in-python/

http://www.jesshamrick.com/2011/05/18/an-introduction-to-classes-and-inheritance-in-python/

https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/

https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/


• byteorder: Remember the byteswap? sys.byteorder provides info on the machine

Endianness.

• path: The default path that Python searches is stored in sys.path. If you have

written modules and classes, and want to access it from anywhere, you can add

path to sys.path as,

In [258]: 1 sys.path. append ('path to your directory ')

A full set of sys methods is provided here.

5.5.2. OS MODULE

This module provides a unified interface to a number of operating system functions.

There are lots of useful functions for process management and file object creation in

this module. Among them, it is especially useful to use functions for manipulating file

and directory, which are briefly introduced below. For details on ‘OS module’, click

here.

Before using file and directory commands, it is necessary to import os module as,In [259]: 1 import os

2 os. getcwd ()

Xsame as pwd in UNIX. Stands for present working directory and displays the

absolute path to the current directory.In [260]: 1 os. mkdir ('dirname ')

Xsame as mkdir in UNIX. Makes a new directory. dirname can be absolute or

relative path to the directory you want to create.In [261]: 1 os. remove ('filename ')

Xsame as rm in UNIX. Removes a file.In [262]: 1 os. rmdir ('dirname ')

Xsame as rm -r in UNIX. Removes a directory.In [263]: 1 os. chdir ('dirpath ')

Xsame as cd in UNIX. Change directory to the location shown by dirpath.

dirpath can be absolute or relative.

https://docs.python.org/2/library/sys.html

http://docs.python.org/library/os.html


In [264]: 1 os. listdir ('dirpath ')

Xsame as ls in UNIX. Lists all files in a directory located at dirpath.In [265]: 1 os.path. exists ('filepath ')

XA very useful os function that checks if a file exists. Returns True if it exists

and False if not.

If you want to know more about these functions, follow this.

5.5.3. ERRORS AND EXCEPTIONS

There are two types of errors in Python, in general: Syntax Errors and Exceptions.

Syntax error are caused by errors in Syntax. Exceptions may be caused by Value,

Name, IO, Type, etc. A full list of exceptions in Python is here.

Python has builtin functions (try and except) to handle the exceptions. An example

is below:In [266]: 1 while True:

2 try:

3 x = int( raw_input (" Please enter a number : "))

4 break

5 except ValueError :

6 print "Oops! That was no valid number . Try again ..."

XTries to convert x to int. If a string is passed, it will raise ValueError and

then goes to next iteration. The above is the simplest example, and there are many

other more "sophisticated" ways to handle exceptions here.

http://docs.python.org/library/allos.html

https://www.tutorialspoint.com/python/standard_exceptions.htm

https://docs.python.org/2/tutorial/errors.html

6ADVANCED STATISTICS AND MACHINE

LEARNING

Using advanced statistics.

61

6.1. Quick overview 62

6.1. QUICK OVERVIEW

In this section, we will focus on a practical example to demonstrate the implementations

of some advanced statistics, specifically machine learning algorithms, to perform gap

filling of eddy covariance data. The concept is to take some gappy data and fill the holes

using the meteorological variables associated with the missing values, then compare

the methods. It should be noted that we will not go into depth about the statistical

methods themselves, but just give an example of the implementation. Indeed, in most

cases we will use the default hyper-parameters, which in nice for an overview, but bad

practice overall. One should always try to understand a method when implementing

it.

6.1.1. REQUIRED PACKAGES

This exercise will require the following packages (all should be available via "conda

install..."):

• numpy

• scipy

• pandas

• scikit-learn

• statsmodels

• netCDF4 (needs hdf4)

6.1.2. OVERVIEW OF DATA

Our sample dataset (provided by me), is a processed eddy covariance file, such as

what you would find from the FLUXNET database. If you are unfamiliar with eddy

covariance, don’t panic, just think of it as a fancy weather station that measures not

just the meteorological data, but how things come and go from the ecosystem (such

as water and carbon). This file is formatted where in half hourly resolution, so it

gives a value for each variable measured every half hour, or 48 points per day (17,520

points per year). One problem with eddy covariance datasets is that they tend to have

6.2. Import and prepare the data 63

missing values, or gaps, due to equipment failures or improper measuring conditions.

So to fix this, we can predict the missing values, or gap-fill the dataset. This particular

dataset has about 40% of the data missing. As we are not the first to deal with gappy

eddy covariance datasets, there is a current "standard" method involving sorting all

the values into a look-up table, where values from a similar time-span and meteo

conditions are binned, and the gaps are filled with mean from the bin. We will try

to fill the gaps using three statistical methods: random forest, neural networks, and a

multi-linear regression.

6.2. IMPORT AND PREPARE THE DATA

We will try to organize this project somewhat like you would a real project, which means

we will have a number of ".py" files in our project, as well as our data files. So to

start, find a nice cozy place in your file system (maybe something like in "Documents"

or "MyPyFiles") and create a new folder (maybe called "AdvStat").

In our nice, new, cozy folder, we can first copy the sample dataset, which should

have the file extension ".nc". Now we can make three files, one named "Calc.py",

one named "Regs.py", and one named "Plots.py". These files can be created and/or

opened into the Spyder IDE to make things a bit easier to work with, or simply in your

favorite text editor.

Now, starting in the "Calc.py" file, we can import numpy and netCDF4 to start us

off. We will import the variables we are interested in and convert them into a numpy

array. Because this provided file has over 300 variables, we will create a dictionary

containing only a subset of variables that we are interested in based on a list, namely:

1 IncludedVars =[ 'Tair_f ','Rg_f ','VPD_f ','LE_f ','LE ',

2 'year ','month ','day ','hour ']

So to build our dictionary, we can start with an empty dictionary (remember "")

called "df". Then we can loop through our IncludedVars and use each item in the list

as a key for df, and pair each key with a numpy array from the netCDF:

1 df[var ]= np. array (ncdf[var ]). flatten () [48*365:]

You may notice two things: first is that we not only turn our netCDF variable

into a numpy array, but we also call "flatten". This is because the netCDF has three

6.3. Setting up the gapfillers 64

dimensions (time, lat, lon), but as this is only one site, the lat and lon dimensions

don’t change, so we can just flatten the array to one dimension. Second is that we are

already slicing the data from 48*365 onwards. This is because the first year is only a

partial year, so we not only have some gaps in the fluxes, but in all the data, which

will mess us up a bit. Thankfully for you, I have been through this dataset and can tell

you to skip the first year. Now, this netCDF is fairly well annotated, so if you would

like more information on a variable, simply ask:

1 ncdf[var]

Some highlights are that we will be trying to gap-fill the "LE" variable (Latent

Energy, a measure of the water flux), which we can compare to the professionally filled

version "LE_f".

As this is a regression problem, we need to get things into an "X" vs "Y" format.

For the X variables we will use the following:

1 XvarList =[ 'Tair_f ','Rg_f ','VPD_f ','year ','month ','day ','hour ']

With our list, we can then create a 2 dimensional array in the form number-of-

samples by number-of-features. We can do this by first creating a list of the arrays,

then calling np.array and transposing. If we want to be fancy, we can do this in one

line as:1 X=np. array ([ df[var] for var in XvarList ]).T

and like magic we are all ready to go with the Xvars. The Y variable is also easy,

it is just equal to LE, which if we remeber is stored in our dictionary as df["LE"].

However, we will do a little trick that will seem a bit silly, but will make sense later.

Lets first store our Y variable name as a string, then set Y as:

1 yvarname ="LE"

2 Y=df[ yvarname ]

I promise this will come in handy. One final task is to figure out where the gaps

are, but we will come to the in the next section, which is...

6.3. SETTING UP THE GAPFILLERS

Now we can move on to our second python file in our cozy folder: "Regs.py". This file

will hold some of our important functions that help us complete our quest of gapfilling.


The only package to import will be numpy. After the import, we can make a very simple

function called "GetMask" that will find our gaps for us. As we extracted the data

from the netCDF, all gaps are given the value -9999, so our function will simply return

a boolean array where all gapped values are True. I tend to be a bit cautious, so I

usually look for things such as:

1 mask =(Y < -9000)

but you could easily say (Y==-9999). Don’t forget to return our mask at the end

of the function!

Now, so we don’t forget, we can go ahead and use this function in our "Calc.py"

file right away. First we need to tell "Calc.py" where to find the "GetMask", so in

"Calc.py" we simply

1 import Regs

and we can set our mask as:1 mask=Regs. GetMask (df[ yvarname ])

Easy as that! Now, we will want to keep everything tidy, so go ahead and also save

our mask into our dictionary (df) as something like "GapMask".

Now, lets go back to "Regs.py" and make a second function. This function will

take all the machine learning algorithms that we will use from the SKLearn package

and gap fill our dataset, so lets call it "GapFillerSKLearn" and it will take four input

variables: X,Y,GapMask, and model. As this function will be a bit abstract, let add

some documentation, which will be a string right after we define the function. I have

made an example documentation for our function here:


1 def GapFillerSKLearn (X,Y,GapMask , model ):

2 """

3 GapFillerSKLearn (X,Y,GapMask , model )

4

5 Gap fills Y via X with model

6

7 Uses the provided model to gap fill Y via the X vairiable

8

9 Parameters

10 ----------

11 X : numpy array

12 Predictor variables

13 Y : numpy array

14 Training set

15 GapMask : numpy boolean array

16 array indicating where gaps are with True

17

18 Returns

19 -------

20 Y_hat

21 Gap filled Y as numpy array

22 """

Now that the function is documented, we will never forget what this function does.

So we can now move on to the actual function. The reason we can write this

function is because the SKLearn module organizes all of it’s regressions in the same

way, so the method will be called "model" whether it is a random forest or a neural

net. In all cases we fit the model as:1 model .fit(X[~ GapMask ],Y[~ GapMask ])

where we are fitting only when we don’t have gaps. In this case the (tilda) inverts

the boolean matrix, making all Trues False and all Falses True, which in our case now

gives True to all indeces where we have original data. Next we can build our Y_hat

variable as an array of -9999 values by first creating an array of zeros and subtracting

-9999. This way, if we mess up somewhere, we can see the final values as a -9999.

Now, we can fill the gaps with by making a prediction of the model with the Xvars as

1 Y_hat [ GapMask ]= model . predict (X[ GapMask ])

where we are no longer using the tilda ( ) because we want the gap indices. We can

return our Yh at at theendo f our f uncti onandmovebacktoour "C alc.py" f i le.

6.4. Actually gapfilling 67

6.4. ACTUALLY GAPFILLING

With our X, Y, mask, and filling functions built, we can actually do some calculations.

For this, we will need to import some more packages, namely:

1 from sklearn . ensemble import RandomForestRegressor

2 from sklearn . neural_network import MLPRegressor

3 import statsmodels .api as sm

where our random forest (RandomForestRegressor) and neural network (MLPRe-

gressor, or Multi-layer Perceptron) is from the SKLearn package and our linear model

will be from the statsmodel package. As everything is set up, we can immediately call

our SKLearn gap filler function as:

1 df[ yvarname +'_RF ']= Regs. GapFillerSKLearn (X,Y,mask ,

RandomForestRegressor ())

and likewise for the MLPRegressor (just remember to change the df key!). Note

that there are many, many options for both RandomForestRegressor and MLPRegressor

that should likely be changed, but as this is a quick overview, we will just use the

defaults. If you were to add the options, such as increasing to 50 trees in the random

forest, it would look like this

1 df[ yvarname +'_RF ']= Regs. GapFillerSKLearn (X,Y,mask ,

RandomForestRegressor ( n_estimators =50))

Unfortunately we cannot use the same function for the linear model, as statsmodels

uses a slightly different syntax (note that SKLearn also has an implementation for linear

models, but it’s good to be well rounded). The statsmodels portion will look strikingly

similar to our "GapFillerSKLearn" function, but with some key differences:

1 X_ols = sm. add_constant (X)

2 df[ yvarname +'_OLS ']=Y

3 model =sm.OLS(Y[~ mask], X_ols [~ mask ])

4 results = model .fit ()

5 df[ yvarname +'_OLS '][ mask ]= results . predict ( X_ols [mask ])

Basically, we have to add another row to our array that acts as the intercept variable,

then we run the same set of commands, but the pesky X’s and Y’s are switched in the

fit command, making it too different to adapt for our "GapFillerSKLearn" function.

Now, our script is basically done, and we can actually run it (in in Spyder, just press

f5).

6.5. And now the plots! 68

Depending on the speed of your computer, it may take a few seconds to run, more

than you might want to wait for over and over. Therefor, before we move on to the

"Plots.py" file, it would be a good idea to save the data so we don’t have to run it every

time. For this, we will use the "pickle" package. "pickle" does a nice job of saving

python objects as binary files, which Sujan loves, so after we import the package, we

can dump our pickle with:

1 pickle .dump( df , open( yvarname +" _GapFills . pickle ", "wb" ) )

You can notice that we save the file with our yvarname, which you will see can

come in handy.

6.5. AND NOW THE PLOTS!Now we can finally move on to our "Plots.py" file, where we will need the numpy,

pandas, and pickle packages. To start, we will keep things simple and just do a

comparison of each gap filling method to the standard "LE_f" from the datafile.

After comparing these, we will use a kernel density estimate to look at the distribution

of our gap-filled values compared to the real, measured values. So in total we will have

four figures.

First, we will use the exact same mysterious trick that we have been using where

we set the yvarname:

1 yvarname ="LE"

Again, mysterious and will be useful I promise.

Now we will need to load the datafile we just created from "Calc.py", but this

time instead of using a dictionary, as the data is all neatly named and every vector is

the same length, we can use the magic of Pandas! So as we load our pickle, we can

directly convert it to a Pandas DataFrame with

1 df = pd. DataFrame . from_dict ( pickle .load( open( yvarname +" _GapFills .

pickle ", "rb" ) ))

Now, in the python or ipython console, you can explore "df" a little bit and see

that it is a nice and orderly DataFrame, which R users will feel right at home in. And

with this DataFrame, we can do much of our initial plotting directly, so we didn’t even

have to import Matplotlib.

6.5. And now the plots! 69

6.5.1. SCATTER PLOTS!

As we have three different methods to compare, we can write the plotting steps as a

function so we aviod doing all that copy and pasting. Lets call our function "GapComp"

and it will take the input variables df, xvar, yvar, and GapMask. First thing we will do

is make our scatter plot of the gap filled values. Pandas is actually bundled with much

of the plotting functionally built in, so the plot becomes one line:

1 fig=df[ GapMask ]. plot. scatter (x=xvar ,y=yvar)

Notice that we will be using our boolean array "GapMask" to index the entire

DataFrame, this is the magic of Pandas. Now, we could call it a day, but what fun

is a scatter plot without some lines on it. So, we will add the results of a linear

regression between our gap filling and the "LE_f" using the "linregress" function from

"scipy.stats" (go ahead and add it to the import list). "linregress" gives a nice output

of a simple linear regression including all the standard stuff:

1 slope , intercept , r_value , p_value , std_err = linregress (df[ GapMask ][

xvar],df[ GapMask ][ yvar ])

Now that we have fit a model to our models, we can plot our line. We will need

an x variable that can fill our line, which we can use the "numpy.linspace" command

as1 x=np. linspace (df[ GapMask ][ yvar ]. min () ,df[ GapMask ][ yvar ]. max ())

And finally, we can print our line with a nice label showing both our equation and

the r 2 value with1 fig.plot(x,x* slope +intercept , label ="y ={0:0.4}* x+{1:0.4} , r ^2={2:0.4} "

. format (slope , intercept , r_value **2))

2 fig. legend ()

And that finishes our function. We can now plot all of our models with a neat little

for loop:

1 for var in ["_RF",'_NN ','_OLS ']:

2 GapComp (df , yvarname +var , yvarname +"_f",df. GapMask )

6.6. Bonus points! 70

6.5.2. DISTRIBUTIONS WITH KDE

With the first three plot done, we can move on to our kernel density plots. again

Pandas will make our lives easier as instead of "df.plot.scatter" we use "df.plot.kde".

Remember we want to compare both our gap filling techniques and the "LE_f" with

the distribution of the real dataset. We can start with plotting the filled dataset using

only the filled values ("df.GapMask"). One fancy trick of Pandas is you can pass a list

of columns, and it will plot all of them. However, because of our mysterious magic

trick with "yvarname", we have to build this list with a little loop, which looks like

1 [ yvarname + ending for ending in ("_RF",'_NN ','_OLS ',"_f")]

Now we can pass this fancy list, either as a named variable, or in a one-liner if we

are even fancier, to the command

1 KDEs=df[df. GapMask ][ ThisFancyList ]. plot.kde ()

where our plot is saved as the variable KDEs. Now, we have to plot our final KDE

from the "LE" column, but we can no longer call it using "KDEs.plot" like we did for

our line in the "GapComp" function. What we have to do then is tell the "df.plot.kde"

command which plot we want it in. For this, we pass the "ax=" argument like so

1 df [~ df. GapMask ][[ yvarname ]]. plot.kde(ax=KDEs)

and viola, our plotting is complete! There, some advanced statistics, easy as cake.

6.6. BONUS POINTS!For some bonus points, you can gap fill another variable called "NEE". NEE stands

for net ecosystem exchange, and it measures how carbon comes and goes from the

ecosystem. All you have to do is extract it from the netCDF (both the NEE and

NEE_f), then switch out all the times you reference LE (hint, we can finally use the

magic trick).

7DATA VISUALIZATION AND PLOTTING

An introduction to plotting using matplotlib and Bokeh

71

7.1. Plotting a simple figure 72

The first part of this chapter introduces plotting standard figures using matplotlib

and the second part introduces interactive plotting using Bokeh.

For comprehensive set of examples with source code used to plot the figure using

matplotlib, click here. For the same for Bokeh, click here.

7.1. PLOTTING A SIMPLE FIGURE

Read the data in the data folder using:In [267]: 1 import numpy as np

2 dat=np. loadtxt ('data/FD - Precipitation_Ganges_daily_kg .txt ') [:365]

First, a figure object can be defined. fisize is the figure size in (width,height) tuple.

The unit is inches.In [268]: 1 from matplotlib import pyplot as plt

2 plt. Figure ( figsize =(3 ,4))

In [269]: 1 plt.plot(dat)

2 plt.show ()

There are several keyword arguments such as color, style and so on that control

the appearance of the line object. They are listed here. The line and marker styles in

matplotlib are shown in Table 7.1.

For axis labels and figure title:In [270]: 1 plt. xlabel ('time ')

2 plt. ylabel ('Precip ',color ='k',fontsize =10)

3 plt. title ('One Figure ')

The axis limits can be set by using xlim() and ylim() as:In [271]: 1 plt.xlim (0 ,200)

2 plt.ylim (0 ,1 e14)

In [272]: 1 plt.text (0.1 ,0.5 , 'the first

2 text ',fontsize =12 , color ='red ',rotation =45 , va='bottom ')

3 plt.text (0.95 ,0.95 , 'the second text ',fontsize =12 , color ='green ',ha='

right ',transform =plt.gca (). transAxes )

4 plt. figtext (0.5 ,0.5 , 'the third text ',fontsize =12 , color ='blue ')

The color and fontsize can be change. For color, use color= some color name

such as 'red' or color= hexadecimal color code such as '#0000FF'. For font size, use

fontsize=number (number is > 0). Also, grid lines can be turned on by using

http://matplotlib.sourceforge.net/index.html

http://bokeh.pydata.org/en/latest/

http://matplotlib.sourceforge.net/gallery.html

http://bokeh.pydata.org/en/latest/docs/gallery.html

http://matplotlib.org/api/lines_api.html#matplotlib.lines.Line2D

7.2. Multiple plots in a figure 73

Table 7.1: Line and marker styles

Line style Marker style

Linestyle Lines Marker Signs

'Solid' — 'o' Circle

'Dashed' −− 'v' Triangle_down

'Dotted' · · · '∧' Triangle_up

'<' Triangle_left

'>' Triangle_right

's' Square

'h' Hexagon

'+' Plus

'x' X

'd' Diamond

'p' pentagon

Also, grid lines can be turned on by usingIn [273]: 1 plt.grid( which ='major ',axis='x',ls=':',lw =0.5)

To set the scale to logIn [274]: 1 plt. yscale ('log ')

7.2. MULTIPLE PLOTS IN A FIGURE

Matplotlib has several methods to make subplots within a figure. Here are some quick

examples of using the ’mainstream’ subplots.In [275]: 1 selVars ='Precipitation Runoff '. split ()

2 nrows =2

3 ncols =1

4 plt. Figure ( figsize =(3 ,4))

5 for _var in selVars :

6 dat=np. loadtxt ('data/FD -'+_var+'_Ganges_daily_kg .txt ') [:365]

7 spI= selVars . index (_var)+1

8 plt. subplot (nrows ,ncols ,spI)

9 plt.plot(dat)

7.3. Plot with Dates 74

7.3. PLOT WITH DATES

The datetime module supplies classes for manipulating dates and times. This module

comes in handy when calculating temporal averages, such as monthly mean from daily

time series. When these date objects are combined with dates functions of matplotlib,

time series data can be plotted with axis formatted as dates. First import the necessary

modules and functions. timeOp is a self made module consisting of functions to convert

daily data to monthly or yearly data.In [276]: 1 import timeOp as tmop # a self made module to compute monthly data

from daily data considering calendar days and so on

2 import numpy as np

3 import matplotlib as mpl

4 from matplotlib import pyplot as plt

5 from matplotlib import dates

6 import datetime

7 dat1=np. loadtxt ('data/FD - Precipitation_Amazon_daily_kg .txt ')

Now, date objects can be created using datetime module. In the current file, the

data is available from 1979-01-01 to 2007-12-31. Using these date instances, a range

of date object can be created by using step of dt, that is again a timedelta object from

datetime.In [277]: 1 sdate = datetime .date (1979 ,1 ,1)

2 edate = datetime .date (2008 ,1 ,1)

3 dt= datetime . timedelta (days =30.5)

4 dates_mo = dates . drange (sdate ,edate ,dt)

Using the functions within tmop module, monthly and year data are created.In [278]: 1 dat_mo =np. array ([ np.mean(_m) for _m in tmop. day2month (dat1 , sdate )])

2 dat_y =np. array ([ np.mean(_y) for _y in tmop. day2year (dat1 , sdate )])

Next up, we create axes instances on which the plots will be made. These axes

objects are the founding blocks of all subplots like object in Python and form the

basics for having as many subplots as one wants in a figure. It is defined by using

axes command with [lower left x, lower left y, width, and height] as an argument. The

co-ordinates and sizes are given in relative terms of figure, and thus, they vary from 0

to 1.In [279]: 1 ax1=plt.axes ([0.1 ,0.1 ,0.6 ,0.8])

2 ax1. plot_date (dates_mo ,dat_mo ,ls='-',marker =None)

XWhile plotting dates, plot_date function is used with the date range as the

https://docs.python.org/2/library/datetime.html

http://matplotlib.org/api/dates_api.html

7.4. Scatter Plots 75

x variable and data as the y variable. Note that the sizes of x and y variables should

be the same. Automatically, the axis is formatted as years.In [280]: 1 ax2=plt.axes ([0.75 ,0.1 ,0.25 ,0.8])

2 ax2.plot( dat_mo . reshape ( -1 ,12).mean (0))

3 ax2. set_xticks ( range (12))

4 ax2. set_xticklabels ([ 'Jan ','Feb ','Mar ','Apr ','May ','Jun ','Jul ','Aug ',

'Sep ','Oct ','Nov ','Dec '], rotation =90)

5 plt.show ()

XSometimes, it is easier to set the ticks and labels manually. In this case, the

mean seasonal cycle is plotted normally, and the xticks are changed to look like dates.

Remember that with proper date range object, this can be achieved automatically with

plot_date as well.

XMatplotlib has a dedicated ticker module that handles the location and for-

matting of the ticks. Even though we dont go through the details, we recommend

everyone to read and skim through the ticker page.

7.4. SCATTER PLOTS

Let’s read the data and import the modules first:In [281]: 1 import numpy as np

2 from matplotlib import pyplot as plt

3 dat1=np. loadtxt ('data/FD - Precipitation_Ganges_daily_kg .txt ') [:365]

4 dat2=np. loadtxt ('data/FD - Runoff_Ganges_daily_kg .txt ') [:365]

5 dat3=np. loadtxt ('data/FD - Evaporation_Ganges_daily_kg .txt ') [:365]

Once the data is read, we can open a figure object and start adding things to it.In [282]: 1 plt. Figure ( figsize =(3 ,4))

2 plt. scatter (dat1 ,dat2 , facecolor ='blue ',edgecolor =None)

3 plt. scatter (dat1 ,dat3 , marker ='d',facecolor ='red ',alpha =0.4 , edgewidth

=0.7)

4 plt. xlabel ('Precip ($kg\ d^{ -1}$)')

5 plt. ylabel ('Runoff or ET ($\\ frac{kg }{d})$',color ='k',fontsize =10)

6 plt.grid( which ='major ',axis='both ',ls=':',lw =0.5)

7 plt. title ('A scatter ')

Xscatter has a slightly different name for colors. The color of the marker, and

the lines around it can be set separately using facecolor or edgecolor respectively. It

also allows changing the transparency using alpha argument. Note than the width of

the the line around the markers is set by edgewidth and not linewidth like in plot.

http://matplotlib.org/api/ticker_api.html

7.5. Playing with the Elements 76

In [283]: 1 plt. legend (( 'Runoff ','ET '),loc='best ')

7.5. PLAYING WITH THE ELEMENTS

Until now, it’s been a dull and standard plotting library. The figure comprises of several

instances or objects which can be obtained from several methods, and then modified.

This makes customization of a figure extremely fun. Here are some examples of what

can be done.

• The Ugly lines: The boxes around figures are stored as splines, which is actually

a dictionary object with information of which line, and their properties. In the

rem_axLine function of plotTools, you can see that the linewidth of some of the

splines have been set to zero.

In [284]: 1 import plotTools as pt

2 pt. rem_axLine ()

• Getting the limits of the axis from the figure. Use gca() method of pyplot to get

x and y limits.

In [285]: 1 ymin ,ymax=plt.gca (). get_ylim ()

2 xmin ,xmax=plt.gca (). get_xlim ()

• Let’s draw that 1:1 line.

In [286]: 1 plt. arrow (xmin ,ymin ,xmax ,ymax ,lw =0.1 , zorder =0)

• A legendary legend: Here is an example of how flexible a legend object can be. It

has a tonne of options and methods. Sometimes, becomes a manual calibration.

7.6. Map Map Map! 77

In [287]: 1 leg=plt. legend (( 'Runoff ','ET '),loc =(0.05 ,0.914) ,markerscale

=0.5 , scatterpoints =4, ncol =2, fancybox =True , handlelength =3.5 ,

handletextpad =0.8 , borderpad =0.1 , labelspacing =0.1 ,

columnspacing =0.25)

2 leg. get_frame (). set_linewidth (0)

3 leg. get_frame (). set_facecolor ('firebrick ')

4 leg. legendPatch . set_alpha (0.25)

5 texts = leg. get_texts ()

6 for t in texts :

7 tI= texts . index (t)

8 # t. set_color (cc[tI ])

9 plt.setp(texts , fontsize =7.83)

7.6. MAP MAP MAP!This section explains the procedure to draw a map using basemap and matplotlib.

7.6.1. GLOBAL DATA

Let’s read the data that we will use to make the map. The data is stored as a big

endian plain binary. It consists of float32 data values, and has unknown number of

times steps, but it is at a spatial resolution of 1◦.In [288]: 1 import numpy as np

2 datfile ='runoff .1986 -1995. bin '

3 data=np. fromfile (datfile ,np. float32 ). byteswap (). reshape ( -1 ,180 ,360)

4 print (np. shape (data))

Once the data is read, first a map object should be created using basemap module.In [289]: 1 from mpl_toolkits . basemap import Basemap

2 _map= Basemap ( projection ='cyl ', \

3 llcrnrlon = lonmin , \

4 urcrnrlon = lonmax , \

5 llcrnrlat = latmin , \

6 urcrnrlat = latmax , \

7 resolution = 'c')

1. Set the projection and resolution of the background map:

Xresolution: specifies the resolution of the map. 'c', 'l', 'i', 'h', 'f'or None

can be used. 'c'(crude), 'l'(low), 'i'(intermediate), 'h'(high) and 'f'(full).

XThe lontitude and latitude for lower left corner and upper right corner can

http://matplotlib.sourceforge.net/basemap/doc/html/


be specified by llcrnrlon, llcrnrlat, urcrnrlon and urcrnrlat:

Xllcrnrlon: LONgitude of Lower Left hand CoRNeR of the desired map.

Xllcrnrlat: LATitude of Lower Left hand CoRNeR of the desired map.

Xurcrnrlon: LONgitude of Upper Right hand CoRNeR of the desired map.

Xurcrnrlat: LATitude of Upper Right hand CoRNeR of the desired map.

In the current case, the latitude and longitude of the lower left corner of the map

are set at the following values:

In [290]: 1 latmin = -90

2 lonmin = -180

3 latmax =90

4 lonmax =180

2. To draw coastlines, country boundaries and rivers:

In [291]: 1 _map. drawcoastlines ( color ='k',linewidth =0.8)

Xcoastlines with black color and linewidth 0.8.

In [292]: 1 _map. drawcountries ( color ='brown ', linewidth =0.3)

Xdraws country boundaries.

In [293]: 1 _map. drawrivers ( color ='navy ', linewidth =0.3)

Xdraws major rivers

3. To add longitude and latitude labels:

In [294]: 1 latint =30

2 lonint =30

3 parallels = np. arange ( latmin +latint ,latmax , latint )

4 _map. drawparallels (parallels , labels =[1 ,1 ,0 ,0] , dashes =[1 ,3] ,

linewidth =.5 , color ='gray ',fontsize =3.33 , xoffset =13)

5 meridians = np. arange ( lonmin +lonint ,lonmax , lonint )

6 _map. drawmeridians (meridians , labels =[1 ,1 ,1 ,0] , dashes =[1 ,3] ,

linewidth =.5 , color ='gray ',fontsize =3.33 , yoffset =13)

Xarange: Defines an array of latitudes (parallels) and longitude (meridians)

to be plotted over the map. In the above example, the parallels (meridians) are

drawn from 90◦S to 90 ◦N in every 30◦ (from -180◦ to 180◦ in every 30◦).


Xcolor: Color of parallels (meridians).

Xlinewidth: Width of parallels (meridians). If you want to draw only axis

label and donât want to draw parallels (meridians) on the map, linewidths should

be 0.

Xlabels: List of 4 values (default [0,0,0,0]) that control whether parallels

are labelled where they intersect the left, right, top or bottom of the plot. For

e.g., labels=[1,0,0,1] will cause parallels to be labelled where they intersect the

left and bottom of the plot, but not the right and top.

Xxoffset: Distance of latitude labels against vertical axis.

Xyoffset: Distance of longitude labels against horizontal axis.

In the example program, the lines and ticks around the map are also removed byIn [295]: 1 import plotTools as pt

2 pt. rem_axLine ([ 'right ','bottom ','left ','top '])

3 pt. rem_ticks ()

Now the data are plotted over the map object as:In [296]: 1 from matplotlib import pyplot as plt

2 fig=plt. figure ( figsize =(9 ,7))

3 ax1=plt. subplot (211)

4 _map. imshow (np.ma. masked_less (data.mean (0) ,0.) ,cmap=plt.cm.jet ,

interpolation ='none ',origin ='upper ',vmin =0, vmax =200)

5 plt. colorbar ( orientation ='vertical ',shrink =0.5)

6 ax2=plt.axes ([0.18 ,0.1 ,0.45 ,0.4])

7 data_gm =np. array ([ np.ma. masked_less (_data ,0).mean () for _data in data

])

8 plt.plot( data_gm )

9 data_gm_msc = data_gm . reshape ( -1 ,12).mean (0)

10 pt. rem_axLine ()

11 ax3=plt.axes ([0.72 ,0.1 ,0.13 ,0.4])

12 plt.plot( data_gm_msc )

13 pt. rem_axLine ()

14 plt.show ()

XA subplot can be combined with axes in a figure. In this case, a global mean

of runoff and its mean seasonal scyle are plotted at axes ax2 and ax3, respectively.

7.6.2. CUSTOMIZING A COLORBAR

• To specify orientation of colorbar,


In [297]: 1 colorbar ()

Xdefault orientation is vertical colorbar on right side of the main plot.

In [298]: 1 colorbar ( orientation ='h')

Xwill make a horizontal colorbar below the main plot.

• To specify the area fraction of total plot area occupied by colorbar:

In [299]: 1 colorbar ()

Xdefault fraction is 0.15 (see Fig. ??).

In [300]: 1 colorbar ( fraction =0.5)

X50% of the plot area is used by colorbar (see Fig. ??).

• To specify the ratio of length to width of colorbar:

In [301]: 1 colorbar ( aspect =20)

Xlength:width = 20:1.

Various other colormaps are available in python. Fig. 7.1 shows some commonly

used colorbars and the names for it. More details of the options for colorbar can be

found here.

Figure 7.1: Some commonly used colormaps

For a list of all the colormaps available in python, click here.

http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.colorbar

http://matplotlib.sourceforge.net/plot_directive/mpl_examples/pylab_examples/show_colormaps.hires.png