chapter 8 managing and curating data. the second step storing and curating data

27
CHAPTER 8 Managing and Curating Data

Upload: arline-phelps

Post on 25-Dec-2015

234 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

CHAPTER 8Managing and Curating Data

Page 2: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

The Second StepStoring and Curating Data

Page 3: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Storage: Temporary and Archival

Permanent archives The only medium acceptable as truly archival is acid-free paper

Electronic storage Do not expect electronic media to last more than 5-10 years Should be used primarily for working copies If used, copy datasets onto newer electronic media on a regular basis

Page 4: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Curating Data

Most ecological and environmental data are collected by researchers using funds obtained through grants and contracts

They are technically owned by the grantingagency, and they need to be made widelyavailable (e.g., Internet)

Unfortunately, when budgets are cut, data management and curation costs are often the first items to be dropped

Page 5: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

The Final StepTransforming the Data

Page 6: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Transformation

A mathematical function that is applied to all of the observations of a given variable Y*=f(Y)

Most are fairly simple algebraic functions as long as they are continuous monotonic functions

DO NOT change the rank order of the dataDO change relative spacing

Page 7: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Why Transform Data?

(1) Patterns in the data may be easier to understand and communicate than patterns in the raw dataConverting curves into straight lines

(2) Necessary for analysis to be valid – “meeting the assumptions”

Page 8: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

The Species-Area RelationshipA classic example

If we plot the number of species against the area of the island, the data often follow a simple power function, S=cAz where

S = number of speciesA = is island areac and z are constants fitted to the data

Page 9: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

The Species-Area RelationshipA classic example

Island Area (km2) No. of species Log10 (Area) Log10 (Species)

Albermarle 5824.9 325 3.765 2.512

Charles 165.8 319 2.220 2.504

Chatham 505.1 306 2.703 2.486

James 525.8 224 2.721 2.350

Indefatigable 1007.5 193 3.003 2.286

Abingdon 51.8 119 1.714 2.076

Duncan 18.4 103 1.265 2.013

Narborough 634.6 80 2.803 1.903

Hood 46.6 79 1.668 1.898

Seymour 2.6 52 0.415 1.716

Barrington 19.4 48 1.288 1.681

Gardner 0.5 48 -0.301 1.681

Bindloe 116.6 47 2.067 1.672

Jervis 4.8 42 0.681 1.623

Tower 11.4 22 1.057 1.342

Wenman 47 14 1.672 1.146

Culpepper 2.3 7 0.362 0.845

Page 10: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

The Species-Area Relationship

(km2)Island Area

0 1000 2000 3000 4000 5000 6000 7000

Num

ber

of S

peci

es

0

100

200

300

400

Page 11: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

The Species-Area Relationship

If species richness and island area are related exponentially, we can transform this equation by taking logarithms of both sides

log (S) = log (cAz)

log (S) = log (c) + zlog (A)

Page 12: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

The Species-Area Relationship

(Island Area)

-1 0 1 2 3 4

(Num

ber

of S

peci

es)

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

log 1

0

log10

Page 13: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Other Transformations

Cube-Root Transformation (Y3) measures of mass or volume that are allometrically related to linear measures of body size or length

Logarithmically transformed examines relationships between two measures of masses or volumes (Y3), and transforms both X and Y

Page 14: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Why Transform Data?Statistics Demands it

All statistical tests require data to fit certain mathematical assumptions

ExamplesAnalysis of Variance (1) homoscedastic

(2) residuals must be normal random variables

Regression (1) normally-distributed residuals that are uncorrelated with the independent variable

Page 15: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Five Common Transformations

(1)Logarithmic Transformation

(2)Square-root Transformation

(3)Angular (or arcsine) Transformation

(4)Reciprocal Transformation

(5)Box-Cox Transformation

Page 16: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Logarithmic Transformation

Replaces each observation with its logarithmY*=log (Y)

Often equalizes variances for data which mean and variance are positively correlated, which also tend to have outliers with positively-skewed residuals

Logarithm of 0 is not defined – add 1 to each observation

Page 17: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Square-root Transformation

Replaces each observation with its square rootY*=SQRT(Y)

Used most frequently for count data, which often follows a Poisson distribution

Yields a variance independent of mean

Does not transform data values equal to 0 – add some small number to observations

Page 18: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Arcsine TransformationAlso Arcsine-square root or angular

Replaces each observation with the arcsine of the square root of the value

Y*=arcsine(SQRT(Y))

Principally used for proportions

Removes the dependence of the variance on the mean

Gives transformed data in units of radians, not degrees

Page 19: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Reciprocal Transformation

Replaces each value with its reciprocalY*=1/Y

Commonly used for data that records rates, which often appear as hyperbolic

Page 20: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Box-Cox TransformationA family of transformations

Y*=(Ylambda-1)/lambda (for lambda 0)Y*=loge (Y) (for lambda=0)

L= -(v/2)loge(s2T)+(lambda-1)(v/n)sigma

(logeY)

V=degrees of freedomN=sample sizes2

T=variance of transformed values of Y

Page 21: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Box-Cox TransformationY*=(Ylambda-1)/lambda (for lambda not equal to 0)Y*=loge (Y) (for lambda=0)

L= -(v/2)loge(s2T)+(lambda-1)(v/n)sigma (logeY)

The value of lambda that results when the last equation is maximized is used in one of the first two equations to provide the closest fit of the transformed data to a normal distribution

The last equation must be solved iteratively (trying different lambda values until L is maximized) using computer software

Page 22: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Box-Cox TransformationY*=(Ylambda-1)/lambda (for lambda not equal to 0)Y*=loge (Y) (for lambda=0)

L= -(v/2)loge(s2T)+(lambda-1)(v/n)sigma (logeY)

When lambda=1, equation 1 results in a linear transformation When lambda=1/2, a square-root transformation When lambda=-1, a reciprocal transformation When lambda=0, equation 2 results in a natural logarithmic transformation ALWAYS try using simple arithmetic transformations FIRST

Page 23: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Box-Cox TransformationY*=(Ylambda-1)/lambda (for lambda not equal to 0)Y*=loge (Y) (for lambda=0)

L= -(v/2)loge(s2T)+(lambda-1)(v/n)sigma (logeY)

ALWAYS try using simple arithmetic transformations FIRST

If data is right-skewed, try using familiar transformations from the series1/SQRT(Y), SQRT(Y), ln (Y), 1/Y

If left-skewed, try Y2, Y3, etc

Page 24: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Original

Logarithmic

Square Root

Arcsine

Reciprocal

Page 25: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Reporting Results

You should report results in the original units, which includes back-transforming the transformed values

Back-transformed mean will be very different from arithmetic mean

Also, back-transformations will normally result in asymmetrical confidence intervals

Page 26: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Back-Transformations

Logarithmic – antilog(Y*) or eY

Square Root – Y*2

Arcsine – Sin(Y*2)

Reciprocal – 1/(Y*)

Page 27: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data

Lastly, transforming data should be added to your audit trail (documented in the metadata)

Create a new spreadsheet and store it onpermanent media

Reporting Results