quantitative methods of data analysis natalia zakharova, ta bill menke, instructor

33
Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Upload: jasmin-bruce

Post on 13-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Quantitative Methods

of Data Analysis

Natalia Zakharova, TABill Menke, Instructor

Page 2: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Lecture 2

MatLab Tutorial

and

Issues associated with Coding

Page 3: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

MatLab Fundamentals

Page 4: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Most important Data Types

Numerical:

Scalars – single value

Vectors – Column or row of values

Matrices – two-dimensional tables of values

Text:

Character string

Page 5: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Scalars

A number you enter

a = 1.265;

A predefined number

b = pi;

The result of a calculation

c = a*b;

Page 6: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

VectorsMatLab can manipulate both

column-vectorsandrow-vectors

But my advice to you isonly use column-vectors

Because its so easy to introduce a bug by doing an operation on one that should have been done on the other.

Use the transform operator ‘ to immediately convert any row-vector that you must create into a column-vector

1.4

2.3

0.1

9.1, 7.1, 4.2, 8.9

Page 7: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Transform Operator

Swap rows and columns of an array, so that

Standard mathematical notation: aT

MatLab notation: a’

1234

becomes [ 1, 2, 3, 4 ] (and vice versa)

Page 8: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Vector

A vector you enter

a = [1.88, 7.22, 5.31, 7,53]’;

Result of a calculation

b = 2 * a;

The result of a function call

c = sort(a);

Note immediate conversion to a column-vector

Page 9: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Matrix

A matrix you enter

A = [ [1,2,3]', [4,5,6]', [7,8,9]' ];

Result of a calculation

B = 2 * A;

The result of a function call

C = zeros(3,3);

That’s the matrix

1 4 7

2 5 8

3 6 9

by the way …

Page 10: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Character strings

You type in a quoted sequence of characters:

s = ‘hi there’;

Occasionally, the result of a function call:

capS = upper(s);That’s ‘HI THERE’, by the way …

Page 11: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

arithmetic

a = 2; a scalar

b = 2; a scalar

c = [1, 2, 3]’; a column-vector

d = [2, 3, 4]’; another column-vector

M = [ [1,0,0]', [0,2,0]', [0,0,3]' ];

e = a*b; a scalar

f = c’*d; the dot-product, a scalar

g = M*d; a column-vector

h = d’*M*d; a scalar

1 0 0

0 2 0

0 0 3

Normal rules of linear algebra apply, which means that the type of the result depends critically on what’s on the r.h.s. – and on its order! Lot’s of room for bugs here!

Page 12: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Element accessSuppose

A = [ [1,2,3]', [4,5,6]', [7,8,9]' ];

Then A(2,3) is Arow=2,col=3 = 8b = A(2,3); sets b to 8A(2,3) = 10; resets A(2,3) to 10

Then A(:,2) is the second column of A

b = A(:,2);

And A(3,:) is the third row of Ac = A(3,:); c=[3, 6, 9 ]; but we agreed, no row vectors

d = A(3,:)’;

1 4 7

2 5 8

3 6 9

4

5

6

3

6

9

1 4 7

2 5 10

3 6 9

Page 13: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

More on :

SupposeA = [ [1,2,3]', [4,5,6]', [7,8,9]' ];

Then A(1:2,1:2) extracts a range of columns

Note that a quick way to make a vector with regularly-spaced elements is:dx = 0.01;N=100;t = dx*[1:N]’;

1 4 7

2 5 8

3 6 9

0.01

0.02

0.99

1.00

1 4

2 5

Page 14: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Logical functions

MatLab assign TRUE the value 1 and FALSE the value 0, so

( 1 > 2 ) equals 0( 1 < 2 ) equals 1

a = [1, 2, 3, 4, 5, 4, 3, 2, 1]’;b = (a>=4); [0, 0, 0, 1, 1, 1, 0, 0, 0]’;

sum( (a>=4) ); is the number of elements in the vector a that are equal to or greater than 4

Page 15: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Logical tests

Blocks of MatLab code that are executed only when a test is true.

One handy use is turning on or off bits of code intended primarily for debugging

doplotone=1;if (doplotone) plot(t,d);end

doplotone=0;if (doplotone) plot(t,d);end

Here its gets plotted Here it doesn’t

Page 16: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

To Loop or Not to Loopa=[1, 2, 3, 4, 3, 2, 1]’;

b=[3, 2, 1, 0, 1, 2, 3]’;

N=length(a);

Dot product using MatLab syntax

c = a*b;

Dot product using loop

c = 0;for i = 1:N

c = c + a(i)*b(i);end

Page 17: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

You should avoid loops except in cases where

No MatLab syntax is available to provide the functionality in a

simpler way

Available MatLab syntax is so inscrutable that a loop more clearly

communicates your intent

Page 18: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

A Tutorialusing the Neuse River Hydrograph

Rain falls and the river rises, the discharge

quickly increases

After the rain, the river falls, the discharge slowly decreases

time

rain

time

disc

harg

e

So, is the river more often falling than rising ?

Page 19: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

What would constitute an appropriate analysis ?

Find, for the 11 year period, the percent of days that the discharge is increasing*, compare it to 50%.

Make a histogram of the rate of increase and decrease of discharge and see whether it is centered around zero or some other number.

* Rising today if today’s discharge minus yesterday’s discharge is positive.

Page 20: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Steps

Import the Neuse hydrograph dataConvert units what we’re most familiar withPlot discharge vs time, examine it for errorsCompute discharge rate (today minus yesterday)Plot rate vs time, examine it for errorsCount up % of days rate is positiveOutput the % of daysCompute histogram of rates and plot it

Tricks: work first with a subset of the data

Page 21: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Finding and Using Documentation

Page 22: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

MatLab Web Site is one place that your can get a description of syntax, functions, etc.

Page 23: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Can be very useful in finding exactly what you want if you’ve only found something close to what you want!

Example 1: the LENGTH command

Page 24: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

. . .(two more pages below)

Example 2: the SUM commandSome commands have long, complicated explanations. But that’s because they can be applied to very complicated data objects. Their application to a vector is usually short and sweet.

Page 25: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Coding Advice

Page 26: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Advice #1

Think about what you want to do before starting to type in code!

Block out on a piece of scratch paper the necessary steps

Without some forethought, you can code for a hour, and then realize that what you’re doing makes no sense at all.

Page 27: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Advice #2

Sure, cannibalize a program to make a new one …

But keep a copy of the old one …

And make sure the names are sufficiently different that you won’t confuse the two ,,,

Page 28: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Advice #3Be consistent in the use of variable names

amin, bmin, cmin, minx, miny, minz

Don’t use variable names that can be too easily confused, e.g xmin and minx.

(Especially important because it can interact disastrously with MatLab automatic creation of variables. A misspelled variable becomes a new variable).

guaranteed to cause trouble

Page 29: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Advice #4

Build code in small section, and test each section thoroughly before going in to the next.

Make lots of plots.

Page 30: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Advice #5

Test code on smallish simple datasets before running it on a large complicated dataset

Build test datasets with known properties. Test whether your code gives the right answer!

Page 31: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Advice #6

Don’t be too clever!

Inscrutable code is very prone to error.

Page 32: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Advice #7use comments to communicate the BIG PICTURE

% c is the dot product of a and bc = 0;for i = 1:N

c = c + a(i)*b(i);end

% set c to zeroc = 0;% loop from one to Nfor i = 1:N

% add a times b to cc = c + a(i)*b(i);% end of the loopend

Which set of comments gives you the most sense of what’s going on?

Page 33: Quantitative Methods of Data Analysis Natalia Zakharova, TA Bill Menke, Instructor

Advice #8BUGS – DON’T MAKE THEM

(an ounce of prevention is worth a pound of cure)

Practices that reduce the likelihood of bugs are almost always worthwhile, even though they may seem to slow you down a bit …

They save time in the long run, since you will spend much less time debugging …

By the way, cutting-and-pasting code, especially when it must them be modified by changing variable names, is a frequent source of bugs, even though its so tempting …