variability and statistical tests. where the variability comes from? instrumental measurements...

Post on 21-Dec-2015

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Variability and statistical tests

Where the variability comes from?• Instrumental measurements• Biology

– Genotype– Environment– Ootype– Experimental factors

• Randomly fluctuating • Gradually changing in time: drift

Errors- Random: not reproducible- Systematic: is reproduced in a particular setting- Major: something crucial has been overlooked in the experiment

Depending on context, these can be classified into factors of:1. Core research interest2. Satellite ones3. Nuisance

Variables

• Nominal: yellow, blue, green…

• Ordinal: small, big• Interval: 0…10, 11…20, 21…30 etc.• Ratio: p/N• Continuous: 3.1415926…

• Discrete: 4, 7, 11• Binary: 0 or 1, “Yes” or “No”

Signal vs. noise

On can introduce a continous varaiable,..

X

Y

Class 1 Class 2

…discrete classes (~bins, levels etc.) for one variable,..

X

Y

Class Xa Class Xb

…or discrete classes for both variables

X

Y

Class Y2

Class Y1

Quantify it!We should figure out which factors are most relevant to the

phenomenon being studied. An example:1. Age σ2(Age) 2. Sex σ2(Sex) 3. Genotype σ2(Genotype) 4. Measurement differences σ2(Measurement) 5. Experimental conditions σ2(Condition)

Thus, the general linear model:

Y = μ + σ2(Age) + σ2(Sex) + σ2(Genotype) + σ2(Measurement) + σ2(Condition) + ε• Y: response of the system• μ: grand mean• σ2: variance from the factor• ε: error (correctly speaking, residual or unexplained variance!)

In other words, to capture a signaly = f(x)

(an example signal: “the higher x, the better y”),

a formalization is needed.

Different methods:

• Work with different data (both factors and responses)

• Have different power in different conditions (sample size, data type, design topology)

• Answer different questions (defined via null hypotheses)

• Provide different amount of supplementary output (graphs, tables etc.)

What are variables?

Variables are things that we measure, control, or manipulate in research. They differ in the role they are given in our research and in the way they are measured.

Correlational vs.experimental research

• In correlational research we do not influence any variables but only measure them

• In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables; for example, a researcher might artificially increase blood pressure and then record cholesterol level.

However, “correlation-like” techniques still may be applied to experimental data. But due to a better quality of the experimental setting, they potentially provide qualitatively better information

Dependent vs. independent variables

• Independent varables ARE MANIPULATED in the experiment

• Dependent ones ARE NOT MANIPULATED

• Independent variables shape the experiment

• Dependent variables measure its result

Relations between variables

• Distributed in a consistent manner

• Systematically correspond to each other

Do not forget the noise!

Features of relations

• Two basic features of every relation between variables “magnitude” (~strength) vs. “reliability” (~confidence, or significance): not totally independent.

Statistical significance

p-level: the probability of the relation to NOT EXIST

Null hypothesis

• Null hypothesis H0 which we test:

– Is the reference point in the analysis– States that “The factor does not work” (or

“The relation does not exist”)– Its rejection proves (at some probability) that

the factor does work (“is likely to work”)!

In the tests we are going to consider, the null hypothesis H0: σ2(The factor) = 0 almost always has an equality condition!

• How to determine that a result is "really" significant?

• How the "level of statistical significance" is calculated?

• Can "no relation" be a significant result? Only after a test on the general population!

• How to measure the magnitude (strength) of relations between variables? (regression)

• Common "general format" of most statistical tests.

Why stronger relations between variables are more significant?

The stronger relation, the higher is the chance it will exceed the noise. Thus, the relation is easier to prove.

• Description of the established relations:– Strong?

1. Absolutely

2. Related to other relations

– Confident?• By different tests

– Robust? What happens if:• we change the method?• the distribution changes the shape?

In the general linear modelx = μ + σ2(Age) + σ2(Sex) + σ2(Genotype) + σ2(Measurement) + σ2(Condition) + εEach of the terms σ2 can be questioned. Moreover, their particular combinations can be studied.x = μ + … σ2(Age X Sex) + … + σ2(Sex X Genotype) + σ2(Age X Genotype X Condition) + … + εExamples:“Does the disease prognosis deteriorate with age equally for men and women?”H0: σ2(Age X Sex) = 0“Is not genotype AbC reaction particularly difficult to detect by measuring with tool Z?” H0: σ2(Genotype X Measurement) = 0

Pearson correlation

Kind of problems that it solves

Describe strength of relations between two variables

Suitability For normally distributed data

Basic theory Sums of squares

Addressable questions If X and Y are related?

Not addressable questions

Format of input data

How to run

Interpretation of the results

Presentation of the results

top related