lab 02 fa17 statistics - university of hawaii · inferential statistics lab 2 fall 2017 2-1...

10
INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! Bring a memory stick Key Concepts Inferential Statistics Student’s t-Test Regression & Correlation Student Learning Outcomes After Labs 1 & 2 students will be able to: 1. manage data in Microsoft Excel by creating tables and graphs (column charts, scatter plots, histograms). 2. describe and explore data with basic statistical functions (mean, median, range, variance, confidence intervals, etc.). 3. test scientific hypotheses using inferential statistics and explain how these results relate to biology. 4. compare two datasets using the Student’s t- test and infer relationships among data using linear regression and correlation. I. INFERENTIAL STATISTICS Thus far, you have learned about the statistical measures that describe one set of data. What if you want to determine whether one population is significantly different from another? Let’s say we are interested in the size of rock wallabies that were introduced to Hawaii and those found in their native Australia. We then catch a bunch of wallabies in Hawaii and Australia and compare the average sizes. It might be a little odd if the averages were exactly the same. The important question, though, is whether or not the difference in the average size is due to normal variation in wallaby size or is it because the two populations grow to different sizes. How different are the averages? When do we say that the difference in size is outside the range of normal variability? How does the mean compare to the variance? These are the questions that statistics will help us answer, and answering these questions is a critical aspect of doing any science. Returning to our tree fern height example, we might want to compare the height of tree ferns at two different locations. We would then go out and measure a sample of trees from each population and compare the mean tree height. It would be highly unlikely that you would have exactly the same sample mean at both locations. But is that difference due to chance or some other effect? For this example, inferential statistics is about determining whether the difference in two data sets is due to normal variability and/or random chance or if some variable is causing a difference. Our null hypothesis is that there is no effect of that variable while our alternative hypothesis is that the variable has an effect. In this lab, we will first illustrate the basics again using some simple examples like student’s height (e.g. are women really shorter than men?). You will also work with real data acquired by field ecologists during their studies. Furthermore you will also carefully consider the hypotheses and perform statistical analyses on the data. Student’s t-Test The student’s t-test is one of the most common statistical tests used to compare two sample means. The null hypothesis (H 0 ) of the t-test states that both population means are equal, i.e., there is no difference between the two population means (H 0 : µ 1 = µ 2 ). The alternative hypothesis (H a ) states that the two population means are different (H a : µ 1 µ 2 ). The t-test will tell you whether you should reject the H 0 in favor of the H a . The Student's t-statistic (t s ) and the degrees of freedom (df) are used in conjunction with the t- table (Appendix D-1) to estimate the probability that the H 0 is true. If this probability is less than 5% (in other words, there is a 95% probability that the H 0 is not true), you reject the H 0 and

Upload: vukhue

Post on 23-Apr-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

INFERENTIAL STATISTICS

Lab 2

Fall 2017 2-1

Reminders! • Bring a memory stick

Key Concepts • Inferential Statistics

• Student’s t-Test • Regression & Correlation

Student Learning Outcomes After Labs 1 & 2 students will be able to: 1. manage data in Microsoft Excel by creating

tables and graphs (column charts, scatter plots, histograms).

2. describe and explore data with basic statistical functions (mean, median, range, variance, confidence intervals, etc.).

3. test scientific hypotheses using inferential statistics and explain how these results relate to biology.

4. compare two datasets using the Student’s t-test and infer relationships among data using linear regression and correlation.

I. INFERENTIAL STATISTICS Thus far, you have learned about the statistical measures that describe one set of data. What if you want to determine whether one population is significantly different from another? Let’s say we are interested in the size of rock wallabies that were introduced to Hawaii and those found in their native Australia. We then catch a bunch of wallabies in Hawaii and Australia and compare the average sizes. It might be a little odd if the averages were exactly the same. The important question, though, is whether or not the difference in the average size is due to normal variation in wallaby size or is it because the two populations grow to different sizes. How different are the averages? When do we say that the difference in size is outside the range of normal variability? How does the mean compare to the variance? These are the questions that statistics will help us

answer, and answering these questions is a critical aspect of doing any science. Returning to our tree fern height example, we might want to compare the height of tree ferns at two different locations. We would then go out and measure a sample of trees from each population and compare the mean tree height. It would be highly unlikely that you would have exactly the same sample mean at both locations. But is that difference due to chance or some other effect? For this example, inferential statistics is about determining whether the difference in two data sets is due to normal variability and/or random chance or if some variable is causing a difference. Our null hypothesis is that there is no effect of that variable while our alternative hypothesis is that the variable has an effect. In this lab, we will first illustrate the basics again using some simple examples like student’s height (e.g. are women really shorter than men?). You will also work with real data acquired by field ecologists during their studies. Furthermore you will also carefully consider the hypotheses and perform statistical analyses on the data. Student’s t-Test The student’s t-test is one of the most common statistical tests used to compare two sample means. The null hypothesis (H0) of the t-test states that both population means are equal, i.e., there is no difference between the two population means (H0: µ1 = µ2). The alternative hypothesis (Ha) states that the two population means are different (Ha: µ1 ≠ µ2). The t-test will tell you whether you should reject the H0 in favor of the Ha. The Student's t-statistic (ts) and the degrees of freedom (df) are used in conjunction with the t-table (Appendix D-1) to estimate the probability that the H0 is true. If this probability is less than 5% (in other words, there is a 95% probability that the H0 is not true), you reject the H0 and

Page 2: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-2

conclude that the two population means are significantly different. While it would not be difficult to calculate ts by hand, most people leave such calculations to the computer or a calculator. If you do need to conduct a t-test by hand, then you will need to find the appropriate tcrit on Appendix D-1 to compare to your calculated ts. To get the appropriate tcrit, use α=0.05 and scroll down to the appropriate degrees of freedom, which should be n1 + n2 – 2. If the variance differs between the two samples, then the degrees of freedom may have to be reduced. When calculating the confidence intervals and probabilities using the t-table, two main assumptions are made: 1. The frequency distributions of the populations being

sampled are approximately normal so that a histogram of the data will look like a bell-shaped curve as shown in Figure 5B (Lab 1).

2. The sample variance (s2) of the two populations

being compared is about the same. This will be true if the sample variances from the two sampled populations are similar in magnitude.

Luckily, the t-test is fairly robust to violations of these two assumptions. For the most part, it is not necessary to worry about these assumptions in this course, but you should be aware that there are many other kinds of statistics that could substitute for the t-tests when the above assumptions have been violated in major ways. Inferring Cause and Effect with Experiments If you want to attribute a difference between two populations to a specific factor, you could set up an experiment where all factors are equal except the factor (variable) of interest. For example you may wish to test the hypothesis that temperature affects the growth rate of copepod populations (microscopic crustaceans). In your experiment, you could grow copepods in 10 water-filled jars, with 5 of the jars kept at a high temperature and the other 5 jars kept at a lower temperature, keeping all other variables such as light, nutrients,

and water volume constant. After 10 days, you could measure the growth rate in these jars and calculate the mean for the high and low temperature treatments. If the difference in mean growth rate is significant using a t-test (P ≤ 0.05), then you can infer that temperature affects the growth rate of the copepods. At this point we should define two new terms: the factor being measured is the dependent variable and the factor being manipulated is the independent variable. For the copepod example, growth rate was measured for each jar (= dependent variable) while temperature was manipulated (= independent variable). Regression & Correlation Two other types of inferential statistics are regression and correlation. Regression and correlation analyses allow you to look for relationships between factors (variables). In this case, our null hypothesis is that the value of one variable does not have an effect on the value of another. Simple regression is used to identify a functional relationship (equation) that best describes the dependence of one variable on another. A scatter plot of tree fern height versus age is shown in Figure 1.

Figure 1. Scatterplot of treefern height versus age. There appears to be a linear relationship, and we can use linear regression to determine the equation of the line that best describes the

Page 3: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-3

relationship. Linear regressions are characterized by the equation:

y = bx + a,

where y is the dependent variable* (*: see nore next page), x is the independent variable, b is the slope of the line, and a is the y-axis intercept. The regression equation and regression line for the tree fern data are shown in Figure 2. The coefficient of determination, r2, indicates how much of the variability in the dependent factor (height) can be attributed to the independent variable (age). r2 can range from 0 to 1. 0 indicates no dependence, while 1 indicates total dependence. A t-test can be conducted on the regression, testing if the slope is significantly different than zero. The P-value indicates that there is a statistically significant linear relationship between height and age (Figure 2).

Figure 2. Output from regression analysis, with best-fit line.

* In performing regression analyses, you will often

have a preconceived idea that one factor may depend on changes in another factor. In our example, age does not depend on height. Rather, tree height probably depends on age. In statistical lingo, height is the dependent variable because it depends on the other variable (age). The dependent variable always belongs on the y-axis.

Regressions can be useful for predicting the value of one variable (e.g. age) based on measurement of the other variable (height). For instance, if you

wish to predict the age of a tree fern that has a height of 2 m, simply plug the height (y) into the regression equation and solve for x (age). You will see that the predicted age of the tree fern is about 40 years. A confidence interval can also be calculated to determine the precision of this estimate. Using the Trendline feature in Excel you can insert a best fit regression line, determine the equation of the line, and determine the coefficient of determination (r2). Excel won’t calculate the P-value or the confidence interval of the regression slope, but you will be provided with a custom Excel template that does this for you. In our regression analysis, we may suspect a cause-and-effect relationship between age and height, but the regression cannot prove this. Other unmeasured factors, such as young trees being more likely to be chewed by feral pigs may be responsible for the observed relationship between height and age. It is possible that in the absence of these pigs, the 15 year old tree ferns would have reached the maximum height of 3 m observed in our oldest tree ferns. If this had been the case, pigs would be an example of a confounding factor that affected our interpretation of the data (see Lab 3). We would need to design further experiments to distinguish these two effects. Correlation is used to compare the strength of association between two variables when we don't have a good idea about whether one variable is directly responsible for changes in the other. Correlation is typically measured by the correlation coefficient, r (note: r*r = r2, the coefficient of determination). r is a measure of the intensity of association between the two variables being compared. r can range from –1 to 1 (Figure 3). r = 1 indicates a strong positive correlation between the two variables, i.e. as one factor increases, so does the other. r = -1 indicates a strong negative correlation between the two variables, i.e. as one factor increases the other decreases. r = 0 indicates no correlation between the two variables, i.e. there is no relationship between the value of one variable and the value of the other.

Page 4: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-4

Figure 3. Positive, negative and zero correlation are illustrated above.

Let’s consider the tree fern heights measured earlier. In the process of measuring the heights, we noticed that tall tree ferns seemed to have thicker trunks (Table 1).

Table 1. Sample data of tree fern age, height, and basal area. Age (years) Height (m) Basal Area

(m2) 10 0.5 .025 25 1.2 .053 30 1.5 .052 40 2.3 .110 60 3.0 .180

Total trunk size can often be predicted by its basal area, which is the cross sectional area at chest height. To determine if there really was a strong association between height and basal area, we went back and measured basal area of the trunks (see table above) and then performed a correlation analysis (Figure 4).

Figure 4. Correlation between tree fern height and basal area.

Notice that the correlation coefficient (r) is close to 1, indicating a strong positive association between height and basal area. The P-value is less than 0.05, indicating that this is a statistically significant correlation (allowing us to reject the null hypothesis that there is no relationship between height and basal area). Note that a low value of r does not necessarily imply the absence of a relationship. A low value of r simply indicates a lot of scatter in the data. As with the other statistics, we use the P-value to indicate significance or non-significance of the relationship (correlation). A t-test is used to assess whether r is significantly different from zero. The computer will calculate this for you, but the formula is provided below for your information:

t = |r| /sr Where:

21 2

−−

=nrsr and df = n-2

The null hypothesis (H0) is r = 0. The P-value for the calculated t is found on Appendix D-1, knowing that df = n-2. If P is less than or equal to 0.05 then the r is considered significantly different from 0, indicating a significant correlation. P, r, & r2 P should not be confused with r and r2. r and r2 are not measures of statistical significance, they only describe the data. Means, medians, and variances are also terms that describe data. A P-value tells us, for example, the likelihood that the difference between two means is due to chance. Depending on that likelihood we either accept or reject specific null hypotheses such as. H0: mean A = mean B, or H0: r = 0.

Page 5: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-5

III. STATISTICS WITH EXCEL, continued

Data, Data Analysis… t-Tests You can conduct t-tests with Excel. Click on Data Analysis in your Data tab. In the Data Analysis window select t-Test: Two-sample Assuming Equal Variances:

In the t-Test window, you must specify the two samples to be tested (Variable 1 Range and Variable 2 Range). You must also enter the Output Range. If you included the column headings when specifying the two samples to be tested, then you must also check the Labels check box. When you are ready, click OK.

The output includes the results of a 1-tail and a 2-tail t-test. Just consider the output for the 2-tail t-test and use the appropriate p-value (done ahead of time). If you want to know more about 1-tail and 2-tail t-tests ask your TA. In the above Excel report, two samples (x1 and x2) are compared in order to determine if there is a significant difference between means. To

determine if they are different, you need to look at your p-value in order to determine whether or not to reject the H0.

Excel Report (sample t-test: two-sample assuming equal variance) If the p-value (two-tail) is less than 0.05, you reject H0. Therefore, you conclude that there is a significant difference between x1 and x2. Regression & Correlation With X-Y Scatter Plots Creating an x-y scatter plot (see Lab 1) is the first step towards analyzing the regression or correlation of two samples. Once you have created the scatter plot, Excel can create a best-fit line or curve, provide you with an equation for the best-fit line, and calculate the r2 value. To add a best-fit line, select the graph by clicking on it with the mouse. On the Design Tab (under Chart Tools), click on Add Chart Element, select Trendline and then click on Linear.

Page 6: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-6

The Trendline or “best-fit line” is the regression or correlation line. To format your Trendline, click on the trendline in your chart. A Window with several options pops up on your right. To display the equation of your trendline select the bar-chart symbol and click on Display equation on chart and Display R-squared value on chart.

If the relationship between the two samples is linear, you can use the Regression-Correlation Excel spreadsheet to determine the p-value. This spreadsheet will be given to you by your TA. If the relationship is not linear, then it is too complicated to calculate a p-value unless the data have been transformed. This is the case in Assignment C, where the natural logs have been calculated for each observation (log transformed). This transformation makes the relationship linear allowing the regression template to calculate a p-value. Just use the Ln(density) instead of Density in the regression template.

IV. LABORATORY EXERCISES & ASSIGNMENTS

1. Haole Koa Pods As a class, get the following information for each haole koa pod from two different locations on Wa‘ahila Ridge: a) length in cm b) number of seeds

Your TA will show you how to get descriptive Statistics from this dataset, how to test if there is a difference between the pods from higher (plot 8) versus lower elevation (plot 2) using a t-Test and how to see if there is a correlation between length and number of seeds of a pod. 2. Student Height With your Lab partner check if the average student height in your class is the same for men and women.

• What is the H0? • What test do you use? • What data do you need to report?

Check your results with your TA.

3. Group exercise Break into four groups. Each group looks at one of the following studies and reports to the class: A. Endemic Hawaiian Mussels B. Laysan Finches C. Wa‘ahila Ridge Data D. Settlement Behavior of Tubeworm Larvae

1. Summarize the study (background, H0, Ha, etc.)

2. Explain which statistical tests / descriptive statistics you will use and why

3. Demonstrate to the class how to do those statistics in Excel (using mock data)

Seed

Page 7: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-7

4. Assignment (35 Pts.) For your assignment in this laboratory, complete questions 1-13. For all Excel spreadsheet data, save the files first on the desktop, and then on your memory stick or email them to yourself, as otherwise you risk to lose files and all the work you just did. For your assignments, your Tables and Figures should be able to “stand on their own,” (as also emphasized in Biology 171 Lab) meaning that they should make sense without needing to read your assignment. Tables and Figures should include axis labels, legends (if more than one series in the graph), a short description (caption), and be numbered sequentially (e.g. Table 3-1, Table 3-2, etc). Tables and Figures must also be cited in the text of your assignment (e.g. “Table 4-1 summarizes the results of…”), or they will be ignored and not counted (your TA should not have to hunt around to find the right Table or Figure in your report). These guidelines should be followed for all lab assignments this semester. Using the basic statistical analyses you have learned in this laboratory, answer all questions at the end of each section (A-D).

Make sure to always include your H0 tested, your statistical test used, your p-value, degrees of freedom (df), as well as other statistical measures if needed (or as requested by your TA). A. Endemic Hawaiian Mussels (7 pts.)

Background Tom Smalley (a former UH Zoology graduate student) was interested in whether individuals of Brachidolites crebristriatlis, an endemic marine mussel, grow larger in sheltered, productive sites (Kahana Bay) than in exposed, less-productive sites (Diamond Head). One might expect mussels to be bigger at productive sites because more food is available. However, in the bay, the water is more stagnant, therefore there is a possibility that

less oxygen is available in the water, which could reduce feeding activity. At Diamond Head, not only is the water more oxygenated, but the water also flows over the mussels faster. The net result is that more food is being delivered to the mussels. To determine whether mussels were bigger at Kahana Bay than at Diamond Head, Smalley took a number of samples from each site. Each sample consisted of the mean shell length (mm) of all mussels in a randomly located 25 cm2 quadrat. For the analysis, the mean for each quadrat has been calculated (FYI: the number of individuals per quadrat ranged from 10 to 40). The resulting data are shown in Table 4.

Table 4. Shell lengths (mm) of mussels sampled in various quadrats at Diamond Head and Kahana Bay.

Diamond Head Kahana

8.8 15.4 10.0 12.4 10.3 13.9 9.6 9.6 9.5 15.4

10.2 11.0 8.1 11.4 8.3 9.6

10.5 10.2

Questions 1. Regarding shell lengths at Diamond Head and

Kahana Bay: a. Are mussels of significantly different size

at Kahana Bay? How would you test this? (Be sure to report all relevant data from your statistical test). (2 pts.)

b. What are the 95% confidence intervals (CI) for the mean shell lengths at the two sites? (1 pt.) Do the 95% CI overlap at the two sites? (1 pt.)

c. What are the implications of this, i.e. what does this mean in biological terms? (1 pt.)

2. Based on the above results:

Can you draw any conclusions about the effects of productive vs. unproductive sites in general? (1 pt.) Why or why not? (1 pt.)

Page 8: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-8

B. Laysan Finches (9 pts.) Background Sheila Conant. Emeritus professor of UH Biology, and her former graduate student, Marie Morin, have been studying the endangered Laysan Finches. Originally, the birds were found only on Laysan Island (North-West of Hawaii). A few were brought to Pearl and Hermes Reef by the U.S. Fish and Wildlife Service to start a second population. The agency reasoned that a single hurricane could potentially wipe out the species on Laysan Island. Thus, having two separate populations of these rare birds in different places would reduce the probability of species extinction (see Conant, S., 1988, "Geographic variation in the Laysan Finch (Telespya contans)”, Evolutionary Ecology 2:270-282). Dr. Conant was interested in whether the translocated finches on Pearl and Hermes Reef had evolved to be different from the Laysan Island population. Specifically, because food sources differed at the two locations, she hypothesized that natural selection would favor different shaped beaks at the two locations. Over several years, Dr. Conant and colleagues made many measurements on birds of all ages and of both sexes. You will examine only the beak width of adult female birds captured in 1987 on Southeast and North Islands (at Pearl and Hermes Reef) and on Laysan Island. "Beak width" is the width of the upper mandible at the distal end of the nares, perpendicular to the mouth line. Calipers were utilized to measure beak width. Measurements were made to the nearest 0.001 cm and recorded in cm. Questions 3. Compare the mean and median beak widths of the birds sampled on each island (Table 2). A large difference between mean and median would suggest a skewed distribution (see Figure 5 in Lab 1).

Do islands appear to be similar or different in terms of direction and degree of beak size skew (i.e. difference between mean and median)? (1 pt.) Explain. (1 pt.)

4. Perform the appropriate statistical test to

answer the following question: Is there a statistically significant difference in mean beak widths between the two islands? (1 pt.) Describe the difference, if there is any. (1 pt.) (Be sure to report all relevant data from your statistical test).

5. a. How might food differences between islands

affect beak shape over many generations? (1 pt.) b. Based on these beak measurements, can we conclusively state that food differences are responsible for beak differences? (1 pt.) c. Are there other possible explanations? Give an example. (1 pt.)

6. How might you be able to determine whether the differences in beak averages are biologically important (as opposed to statistically significant)? (2 pts.)

Table 2. Measurements of beak width in cm for birds sampled on Southeast and Laysan Islands.

Southeast Laysan 0.698 0.738 0.724 0.757 0.745 0.754 0.742 0.766 0.748 0.761 0.755 0.772 0.754 0.768 0.764 0.788 0.754 0.777 0.77 0.736 0.755 0.814 0.778 0.752 0.759 0.712 0.728 0.758 0.772 0.74 0.744 0.766 0.777 0.754 0.755 0.774 0.782 0.762 0.765 0.791 0.782 0.768 0.772 0.737 0.783 0.777 0.782 0.753 0.787 0.816 0.73 0.759 0.788 0.721 0.746 0.766 0.789 0.742 0.756 0.775 0.798 0.754 0.766 0.801

0.8 0.762 0.772 0.738 0.801 0.77 0.784 0.754 0.807 0.777 0.732 0.759 0.853 0.702 0.75 0.766

0.775

Page 9: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-9

C. Wa’ahila Ridge Data (11 pts.) Background Each fall BIOL 265L students collect data on vegetation patterns on Wa‘ahila Ridge. Each Lab section collects data from a particular elevation on the ridge. The data for the entire class are combined and analyzed for patterns in species distribution with respect to environmental characteristics. Table 3 gives data for the legume tree also called “Haole Koa”, Leuceana leucocephala. Ln indicates tree density values that are “log-transformed” using natural log. Questions. 7. Examine the relationship between the density

of L. leucocephala and the altitude. a. Make an X-Y scatterplot of the

relationship between density and altitude. (1 pt.) Is the relationship positive or negative? (1 pt.) Using the ln-transformed density data, determine whether the relationship is statistically significant. (1 pt.) (Be sure to report all relevant data from your statistical test). (use the Regression-Correlation template) Explain whether regression analysis or correlation analysis is more appropriate. (1 pt.)

8. Examine the relationship between the density

of L. leucocephala and the soil potassium levels. a. Make an X-Y scatterplot of the

relationship between density and potassium. (1 pt.) Is the relationship positive or negative? (1 pt.) Using the ln-transformed density data, determine whether the relationship is statistically significant. (1 pt.) (Be sure to report all relevant data from your statistical test). (use the Regression-Correlation template)

Explain whether regression analysis or correlation analysis is more appropriate. (1 pt.)

9. Consider the relationship between soil

potassium and plant density. Propose a hypothesis (Ha) that could explain the relationship and state the corresponding null hypothesis (H0). (1 pt.)

10. On Wa‘ahila ridge, there is a trend towards

increasing rainfall with increasing elevation (this is a fact that does not need to be tested in your proposed experiment below). L. leucocephala seems to do poorly in the driest environments (this is a casual observation; we are not actually sure about this). Describe an experiment you could use to evaluate the hypothesis that the relationship between L. leucocephala density and altitude is related to water availability. (Max. 4 sentences) (2 pts.)

Table 3. Density of Leuceana leucocephala (haole koa) with respect to elevation and soil potassium concentration along Wa‘ahila Ridge. The natural log transform of density is provided for linear correlation or regression analysis.

Altitude (ft)

Potassium Concentra-tion Index

Density (#/m2)

Ln(Density)

365 3.0 0.2000 -1.609438 410 3.1 0.3125 -1.163151 450 4.2 0.3000 -1.203973 465 3.6 0.1250 -2.079442 486 4.6 1.1250 0.117783 550 4.4 2.8250 1.038508 575 4.5 3.6875 1.304949 630 5.2 3.5830 1.276293 700 4.5 10.5750 2.358493

Page 10: Lab 02 FA17 Statistics - University of Hawaii · INFERENTIAL STATISTICS Lab 2 Fall 2017 2-1 Reminders! answer, and answering these questions is a critical • Bring a memory stick

Laboratory 2 Statistics

Fall 2017 2-10

D. Settlement Behavior of Tubeworm Larvae (8 pts.)

Background BIOL 301L (“Marine Ecology and Evolution”) students have collected the following data on larval settlement behavior of the tubeworm Hydroides elegans. Hydroides elegans is a serpulid polychaete, often also referred to as fan or feather duster worms, and an important member of the fouling community in Hawaii. The fouling community is diverse, and is composed of several marine invertebrate taxa as well as algae that colonize submerged surfaces including hulls of ships and pilings. Adults of H. elegans are attached to a submerged surface and therefore rely on their free swimming larvae to find and choose a suitable surface to settle on. Dr. Michael Hadfield (Zoology faculty at UH Kewalo Marine Laboratory) has shown that bacteria that make up a biofilm on the settlement surface have an effect on settlement of H. elegans larvae. Table 5 gives data on bacterial density and % of settled H. elegans larvae according to age of biofilm. Questions 11. a. What statistical test would you use to analyze these data? Why? (1 pt.) b. What are possible H0 and Ha of the test? (1 pt.)

12. Examine the relationship between the age of biofilm and bacterial density (i.e. absorbance).

a. Make an X-Y scatterplot of the relationship between age of biofilm and absorbance. Is the relationship positive or negative? (1 pt.) Determine whether the relationship is statistically significant. (1 pt.) (Be sure to report all relevant data from your statistical test). (use the Regression-Correlation template) Explain whether regression analysis or correlation analysis is more appropriate. (1 pt.)

13. Examine the relationship between age of biofilm and % larval settlement. a. Make an X-Y scatterplot of the

relationship between age of biofilm and % settlement. Is the relationship positive or negative? (1 pt.) Determine whether the relationship is statistically significant. (1 pt.) (Be sure to report all relevant data from your statistical test). (use the Regression-Correlation template) Explain whether regression analysis or correlation analysis is more appropriate. (1 pt.)

Table 5. Age of biofilm (number of submerged days), absorbance of biofilm (indirect measure for bacterial density) and % settlement of H. elegans larvae.