a technical review of the usfs national woodland owner survey · 2018. 1. 25. · executive summary...

40
A Technical Review of the USFS National Woodland Owner Survey

Upload: others

Post on 26-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • A Technical Review of the USFS National Woodland Owner Survey

  • Executive Summary

    The National Woodland Owner Survey (NWOS) has been conducted on a nation-wide basis since 1978 (Dickinson & Butler 2013). For four decades, the NWOS has been referred to as the official census of the family forest owners in the U.S. and is referred to as the social complement to the biophysical forest inventories conducted through the Forest Inventory and Analysis (FIA).

    This survey has driven both administrative decisions and conservation policy and, as a result, has shaped public perception of privately-owned forests in America (Butler et al 2016b). The NWOS is the largest and most comprehensive study of its kind.

    The following is a peer review to assess the methodologies employed in the NWOS study design and, by extension, the validity of the 2011-2013 results and report titled Who Owns America’s Trees, Woods, and Forests?, published by the U.S. Forest Service (USFS) Northern Research Station.

    A thorough review of the NWOS, and associated literature and technical documents, revealed concerns regarding the scientific validity of the NWOS due to significant departures from standard business practices of social science and statistical analysis. The paradigm upon which this review is based is the reduction of Total Survey Error (TSE), popularized by Dillman and colleagues, which is the accepted practice by the American Association for Public Opinion Research (Biemer 2010; Dillman et al 2014).

    Key Concerns Regarding the Methodology of the NWOS

    It is difficult to ascertain which factor contributes the most TSE and misrepresents owners of large acreages; however, the following are the largest factors that likely contribute to misunderstandings in the data:

    1. Coverage error

    A large contributor to TSE is coverage error, meaning that some eligible participants do not have a chance of being selected to take the survey. Some coverage error is introduced if the acreage selected from each hexagon is forest but not detected in the remote sensing process. Coverage error is exacerbated during the process of matching the acreage to owners using public tax records (i.e., acreage identified by GIS, but do not have corresponding information in tax records). Because of the multiple steps that are necessary to convert FIA data into a sample frame for the NWOS study, the coverage error is a considerable factor in TSE, but a lack of documentation prevents a complete peer review of this process. Future iterations of the NWOS may consider conducting their research independent from the FIA research effort to find a less problematic way to investigate private forest owners.

    2. Nonresponse bias

    Another contributor to TSE is nonresponse bias error, meaning that those who responded to the survey are different from those who did not respond. Standard practice for survey research is to initially collect data using the mode of data collection that experts believe will lead to the best response (in the NWOS case, a mail survey). Then, researchers use an alternate mode of data collection on the people who did not respond to the original mail survey to measure nonresponse bias (in this case, a telephone survey). The mail responses are contrasted with the

  • telephone responses to produce a nonresponse weighting to account for the nonresponse bias. In the NWOS, two of the six variables showed there was indeed a nonresponse bias; nevertheless, there was not a nonresponse bias adjustment applied. The telephone nonresponse bias was then added to the mail and web responses, which is untraditional to survey research, particularly since the telephone responses contained only 22% of the original mail questionnaire. Had the telephone nonresponse bias check been used in the conventional manner, the cooperation rate would decline from 52% to 45%. Further, the NWOS inappropriately combined different modes of collection (mail, web, phone) without conducting statistical tests to ensure that findings were truly determined as findings, rather than artifacts of the collection mode (cf. Chase 2016).

    3. Area-based estimates versus ownership-based estimates

    In the NWOS, there are many small acreage owners who control a small land mass, and very few large acreage owners who control a large land mass (e.g., Acreages larger than 500 acres account for only 17.1% of the sample, yet control 27.6% of the acreage). Therefore, there is confusion when estimates are based on the number of owners versus the estimates based on the area controlled by the owners. Providing estimates based on number of owners underrepresents the conservation impact of large acreage owners as it does not account for the proportionally larger acreage they control. As an example, an earlier NWOS report states, “only 3% of the owners have a written management plan…” (Butler and Leatherberry 2004). While this statement is technically correct, a casual reader or reporter may erroneously conclude that only 3% of private forest has a written management plan. Because most of the written plans are completed by large acreage owners, the amount of acreage that has a written plan is approximately 25%. In the words of the staff of the Northern Research Service Station and Family Forest Research Center (hereafter “FFRC staff”), “A challenge with presenting results from the National Woodland Owner Survey is how to clearly and precisely report different units, i.e., ownerships versus acres” (Butler et al 2012).

    4. Data weighting

    There is concern regarding the appropriateness of the weighting of the survey results. FFRC staff has provided different, often conflicting, narratives regarding the weighting process, at one point indicating there were no weights included in the statistical analysis. It is apparent from the USFS NWOS TableMaker application that the crosstabulations were weighted in some fashion; however, FFRC staff has not provided the actual values of the weights. Furthermore, it appears as if the NWOS did not follow the standard practice of using various weights to extrapolate to different domains of interest. Although the NWOS claims that ownerships are weighted proportional to the probability of selection, when replicating the NWOS methodology there are concerns that the randomization of points within the hexagon are not truly random.

    5. Terminology

    Word choice is extremely important in survey design as it influences the responses. Specific to the NWOS, the term “wooded land” to signify forest may lead to incorrect conclusions. Using this term, respondents who have trees on their property are included with forest owners in the same analysis. This terminology overemphasizes the role of small wooded acreages of land,

  • particularly those with fewer than 50 acres (450 by 450 meters square). At the same time, the terminology marginalizes the owners of larger forests (500+ acres), who have the greatest influence on forest conservation through their large acreages. Additionally, the inclusion of “rangeland” into the study artificially yields a 12% increase of “wooded land.”

    6. Data Type

    The 2013 NWOS treats the 5-point continuous scales as five different categories, which is unorthodox to social science research. Further, the 2013 NWOS unnecessarily changes naturally continuous data into categorical data. Categorizing data in this fashion artificially reduces the explanatory power of the data, precludes any regression analysis, and limits any multivariate analyses.

    7. Sample size

    The sample size is acceptable if the results are published for national level estimates only. Initially, the target sample size for the 2013 NWOS for each state was 250 responses per state, but the study achieved its sample size goal in only 16 states. Therefore, if states are a domain of interest, the study design needs to be revisited, particularly if comparisons are done on several variables within only one state.

    8. Transparency

    Metcalf and colleagues (2014) have called for further peer review of the methods of the NWOS. As replication is fundamental to peer review, the methodology and analysis should be made transparent; in fact, many academic journals now request or require the data used for the journal article be made available to peers. Data collected about the American public, collected with American taxpayer funds, should be made available for examination by the American public. Clearly there is a need for anonymity of respondents, as well as a need to assure future potential participants of their privacy. However, by telling future respondents their personally-identifying information will be removed and their responses will be made available to the scientific community would make sharing the dataset significantly easier. This is the approach of the Department of Interior-U.S. Fish and Wildlife Service’s National Survey of Fishing, Hunting, and Wildlife-Associated Recreation.

    While the NWOS is highly rigorous in some areas, there are also study elements that could be considered unorthodox or even outside the standard business practices and norms of the social science community. This peer review concludes with several specific recommendations that have the intention of improving the scientific rigor and accessibility of the NWOS.

  • Page | 1

    Background of the NWOS

    The National Woodland Owner Survey (NWOS) has been conducted by the USDA-Forest Service Family Forest Research Center (FFRC) since the 1970s but only on a nation-wide basis since 1978 (Dickinson & Butler 2013). For four decades the NWOS has been considered the official census of the family forest owners in the United States. This survey has driven both administrative decisions and conservation policy and, as a result, has shaped public perception of privately-owned forests in America (Butler et al 2016b). Though the NWOS is the largest and more comprehensive of its kind, the validity of its results have been called into question, as some critics believe the NWOS may be misleading because the behaviors and attributes of larger forest owners may be different from owners of smaller wooded lands (Gable & Ward 2018). This is important because about a third of America’s forest lands (or 290 million acres) are privately held by 10.6 million families. However, only 11% of those families hold over 70% of those lands. The attitudes and conservation behaviors of the 11% may be significantly different than those of owners of smaller forests or wooded lands.

    To this end, Responsive Management was engaged to assess the validity of the survey by performing an assessment of the methodology and reporting of the National Woodland Owners Survey.

    Within this broad mandate, specific components of the NWOS research process were investigated:

    A. the sample size for each state, and the effects of the survey intensification process (see table in “Sample Size Error” section)

    B. the quantity of landowners of various acreage sizes and how it affects the results and their interpretation (see table in “Sample Size Error” section)

    C. the methodology for converting statistics of the sample to population estimates D. The focus group process that determined phrases and the construction of the wording of

    the questions i) the use of the term “woodland” in the survey versus “forest land” ii) the inclusion of rangeland in overall forest statistics and analysis

    Methodological Assessment

    The paradigm for evaluation of the NWOS is the Total Survey Quality. This approach was popularized by Dillman and colleagues and is the accepted standard business practice by the American Association for Public Opinion Research (AAPOR) (Biemer 2010; Dillman et al 2014). In the Total Survey Quality approach, the ultimate goal of the researcher is to minimize Total Survey Error (TSE), which is a comprehensive accounting for aspects of public opinion polling that are subject to error (Weisberg 2005). Most researchers follow a version of this approach; therefore, any publications or technical reports should have each point well-documented for peer review. TSE is divided into constituent parts as follows:

    Sampling Error In order for sample statistics (the value found within the sample) to match population parameters (the true value of the underlying population), there is an assumption of random sampling. Any sampling scheme that varies from simple random sampling must be penalized for any stratification imposed.

  • Page | 2

    Sample Size Error As sample size progressively increases, the uncertainty surrounding the sample statistics (the value found within the sample) shrinks and becomes more characteristic of the population parameters (the true value of the underlying population). A survey estimates the results of a hypothetical census (complete polling of all the population). Frequently, budget or logistics prevent a census, and therefore the results of the survey are often “close enough.” The trade-off is extensive cost savings in exchange for an accepted margin of error that surrounds the statistic wherein the true estimate lies. Often during presidential races or ballot measures it is acceptable to have a 3% margin of error, which allows a researcher to survey just 1,068 people, so long as those people are representative. Therefore, if a statistical report on a ballot initiative indicates it will pass at 64% (±3%), there is a great likelihood the initiative will pass with between 61% and 67% approval.

    Estimator Error This type of error is introduced by an incorrect choice of estimator or by using a statistical approach that is inappropriate in the research context. For example, frequently researchers outside of the social sciences will report Likert-type questions as categorical data, although treating it as a scale is more informative and appropriate.

    Specification Error This error appears when the concept measured by the survey differs from the concept that should have been measured, or from the concept discussed in the research. Specification error is often introduced when a question is used as an index for another, such as income being used as a surrogate for Socioeconomic Status, or when the question functions for only a portion of the population (“Do you currently have a job?” could work as a surrogate to calculate unemployment but would miscategorize retirees as unemployed and vastly overestimate unemployment).

    Coverage Error This error is introduced when the universe the sample is drawn from does not match the population the researcher is trying to measure. No sampling frame has 100% coverage for all research questions, but there are many that have acceptable coverage at greater than 95%. A historically prominent example of this is the Truman-Dewey presidential campaign of 1948. This telephone survey was conducted during a time when telephones were only readily available to Americans who were wealthier. Therefore, the sample frame contained more Republican voters than occurred in the population. Newspapers prematurely called the race for Dewey based on this research, even though it was subject to a clear coverage error.

    NonResponse Error This error occurs when people responding to the survey are systematically different from those who chose not respond. This error is also introduced if there is a question on a survey that is systematically skipped, such as questions seeking information of a sensitive nature (Item nonresponse error). Survey questions about income are frequently skipped by North American

  • Page | 3

    respondents because the questions violate social norms regarding discussions of wealth.

    Measurement Error Measurement error comes from people interacting with the questionnaire in undesirable ways, such as when respondents deliberately or unintentionally provide incorrect information to questions on the survey. Respondents may adjust their response to a socially desirable response (“How many times do you exercise in a week?”) or strategically bias their answers in an effort to manipulate outcomes of the survey (“Would you be willing to pay a dollar more for this product?”). Measurement error can also derive from poorly constructed questions that are ambiguous, are too long, have confusing instructions, or have terms with multiple interpretations

    Processing Error Data processing error encompasses errors in data preparation, coding, misapplication of weights, or misreporting. This is frequently the least contributing factor because there are many safeguards in place to prevent processing error and because the computing power of today allows for replicability. Frequent examples of this include an Excel spreadsheet that contains an incorrect cell reference when calculating weights, forgetting to use standardized weights, or the rounding error that occurs when not using sufficiently significant digits when adjusting the weights to apply back to the population.

  • Page | 4

    Total Survey Error for the NWOS Information for the NWOS was taken from academic journals, such as the Journal of Forestry and the Northern Journal of Applied Forestry, and USFS technical reports (see Appendix A for the Literature Reviewed). Personal communication with the staff of the Northern Research Service Station and Family Forest Research Center (“FFRC staff”) is noted, as are cases where information was not available.

    Sampling Error The NWOS has a very unique sampling design that is not conventional to the social sciences. To summarize the sampling scheme for the NWOS, the respondent pool derives from the Forest Inventory and Analysis (FIA) sample design (Bechtold and Patterson 2005), wherein a tessellation of 5,937-acre hexagons was superimposed on a GIS layer of the United States. Within each hexagon, a sample point was randomly chosen. Because the United States is 3.797 million square miles, the quantity of sampling points is approximately 409,000. The study has a 5-year rotation, meaning that in any given year approximately 82,000 points are examined. In the 2013 NWOS, Connecticut, Delaware, Hawaii, Massachusetts, Maryland, Montana, New Jersey, and Rhode Island resulted in smaller sample sizes and therefore the NWOS used smaller hexagons to obtain a larger sample size from these states.

    The selected hexagons were then examined using remote sensing to determine if the sample point was forest. Forest is defined by FIA as “land at least 120 feet (37 meters) wide and at least 1 acre (0.4 hectare) in size with at least 10 percent cover (or equivalent stocking) by live trees including land that formerly had such tree cover and that will be naturally or artificially regenerated” (Oswalt et al 2014, p31) and as further defined in the FIA field manual (U.S. Forest Service 2010). Hexagons were then numbered from 1 to 5 to impose a 5-year inventory cycle.

    These locations were then cross-referenced with publicly-available tax records and assigned to 1 of 16 ownership categories. Owners of private forests were then recruited to participate in the NWOS.

    Observations The following observations were made based on the description of the methods:

    1) In general, there are several aspects where the methods used for the study and the criteria used for decision making are sparsely documented. For research to contribute to the larger body of science, it must be presented with enough detail to be replicable.

    2) In no technical document or publication of the NWOS is there a discussion about adjustments made for design effect (a “penalization” for the departure from simple random

    In order for sample statistics (the value found within the sample) to match population parameters (the true value of the underlying population), there is an assumption of random sampling. Any sampling scheme that varies from simple random sampling must be penalized for any stratification imposed.

    Sampling Error

  • Page | 5

    sampling). Further, the methods are not sufficiently documented to recommend the value for the design effect adjustment.

    3) The decision to conduct more intensive sampling in some states and the criteria for selecting which states received more intensive sampling were not well-documented, other than “there were private funds available” (Butler et al 2013). There is no written justification provided for spending additional resources on states not well known for their timber production, rather than creating tighter statistical estimates in geographies more known for forested lands. Additionally, there was not sufficient reasoning for spending the funds in northeastern states rather than focusing those efforts in states with smaller sample sizes.

    4) The remote sensing methodology may have additional errors: a) Error propagation is a well-known, and well-established issue in GIS analysis, and is not

    addressed for the NWOS in any available literature. b) Error is introduced if there is a time lag between the time that the FIA data were

    collected and the time the questionnaires were mailed to respondents. Changes on the landscape may happen very quickly; if the coverage data layer is not current it may lead to misidentification of forested lands. The version of the coverage layer was not reported and could be a confounding factor.

    c) A portion of the definition of forest used on the NWOS questionnaire (“land that formerly has such tree cover and will be regenerated…”) may not be identified in the remote sensing process, therefore recently harvested acreages may be under-represented. Because larger acreages are more frequently owned for timber harvesting purposes, the under-representation from remote sensing may be strategically biased against larger acreages.

    d) The ground-truthing process of the remote sensing is not documented sufficiently. Specifically, the error rates of false positive (i.e., remote sensing indicated forest but was determined not to be forest through ground truthing) and false negative (i.e., remote sensing indicated no forest but was determined to be forest through ground truthing) findings were not reported.

    e) The percentage of points from the hexagons that were ground-truthed was not documented.

    5) The crosswalk process from the GIS output to the tax record introduces error. a) The percent of points from the hexagons that were identified as privately-owned forest

    that were successfully matched to a legitimate tax record was not reported. Additionally, the number of forest acreages that were identified but that had no associated contact information is unknown.

    b) There are errors in the tax record that presumably led to the 8% nondeliverable rate. Those potential respondents were treated as if they were ineligible (known-ineligibles) rather than simply not contactable (nonrespondents of unknown disposition). This is incongruent with AAPOR standard practices and causes the stated compliance rate to further diverge from the response rate.

    c) The process for determining if a “corporation is only for tax purposes” is not well documented, and it is unclear if the process is objective. The criteria for exclusion from, or inclusion in, this category are not identified and therefore are not repeatable. Further, if the criteria for inclusion or exclusion are based on self-reported information, the

  • Page | 6

    process is imperfect as self-reporting to a federal government agency is likely to be biased, particularly with issues relating to income.

    6) As there is a known point of diminishing returns from prior research, it is unclear why random selection was not stratified prior to data collection and then performed using post hoc weighting. This is particularly curious when states are a known domain of interest to many stakeholders.

    7) Because 6,000-acre hexagons are tessellated and numbered 1 to 5 (meant to impose a 5-year inventory cycle), larger acreages are more likely to be included in the survey in successive years. If an acreage is thrown out in subsequent years because it was sampled in the prior year, large acreages are likely underrepresented.

    Sample Size Error In research, conducting a census (contacting all eligible participants) is prohibitively expensive, therefore, a survey is conducted (a random sampling of all eligible participants). In exchange for saving a significant amount of money by only conducting the survey, the researcher accepts there is some amount of uncertainty surrounding the estimates. A measure of that uncertainty surrounding each estimate is a Coefficient of Variation (CV). A CV near 0% means the estimate is extremely precise, and an estimate further from 0% means there is more uncertainty in the estimate. This uncertainty is the source of sample size error, and it stands to reason that as the number of respondents to a survey increase, the less uncertainty there is surrounding the estimate.

    A graph of the relationship between the sample size and the CV for each state in the 2006 NWOS indicated that samples larger than 250 responses tightened the CVs only slightly (Figure 3 reprinted from Butler et al 2016a, p7). Therefore, for the 2013 NWOS, the goal of a minimum of 250 responses per state was established (Butler et al 2016a). With the end goal of having CVs smaller than 15% or a minimum of 250 per state, the NWOS should have had a sample size of 12,500 respondents to the survey nationally. The 2013 NWOS sent out 21,250 mail surveys and 9,615 eligible participants responded to the survey. Sixteen states, or about one-third, met the 2013 NWOS goal of 250 respondents.

    Sample sizes for the NWOS point estimates are difficult to ascertain. Factors that contribute to the confusion of the sample size include the treatment of nonresponses,

    As sample size progressively increases, the uncertainty surrounding the sample statistics (the value found within the sample) shrinks and becomes more characteristic of the population parameters (the true value of the underlying population).

    Sample Size Error

    Figure 3 from Butler et al 2016– The relationship between the uncertainty (CV) and sample size from the 2006 NWOS. 250 respondents from each state was the sample size goal set for 2013 NWOS.

  • Page | 7

    item-nonresponse error, the differing treatment of owners with acreages under 10 acres, the new addition of rangeland into the study, and the different states that are excluded from various analyses.

    In some NWOS documents it is stated 40,478 hexagons yielded 10,092 respondents. In other locations, the NWOS claims 9,861 responses nationally from owners with acreages of 1 acre or

    greater. In still other locations, such as the NWOS TableMaker (2019), the responses are 9,615 for acreages 1 acre and larger and 8,507 responses for acreages 10 acres and larger. The table to the left indicates the sample sizes by size of acreage for only the question of “How many acres of wooded land do you currently own in [STATE]?” This is one of the six questions on the telephone survey, so responses may be smaller for other questions. Item nonresponse is also variable with the mail responses, so the sample size is variable depending on the crosstabulation performed.

    It is standard best practice to state beforehand, prior to data collection, what percentage of the survey would need to be

    completed by the respondents to be considered a completed response. The stringency or leniency of this decision is situationally contingent upon the stakes of the survey, the availability of funds, and the caliber of the research institution. However, the percentage required to be considered a completion is not available for the NWOS, thus we cannot know how many questions were answered by respondents for their partial surveys to be included in individual question analysis. It was ascertained 8,650 responses were collected via mail, and 177 were collected electronically. A nonresponse bias check was then conducted with 1,256 people via telephone. The nonrespondents from the telephone nonresponse bias check were then reclassified as respondents and then grouped with the mail and web responses. It is important to note the people completing the telephone nonresponse bias check answered 22% of the full mail questionnaire (Butler et al 2016a, p9), a percent of completion that few social scientists would consider as a full response.

    Observations Based on the description of the methods, the following observations were made:

    1. When drawn completely at random, a sample of 1,200 respondents is frequently sufficient to discuss the entire U.S. population. Therefore, the NWOS sample size is acceptable if the results are published for national level estimates only. The sample sizes at the state level or other domains of interest (e.g., landowners of larger than 5,000 acres) may be insufficient to make statistical inference. This may lead to erroneous conclusions regarding the management of private forests.

    Size of acreage Responses Percent

    1-10 1,038 10.8 11-19 703 7.3 20-49 1,747 18.2 50-99 1,675 17.4

    100-199 1,527 15.9 200-499 1,392 14.5 500-999 604 6.3

    1000-4999 658 6.8 5000-9999 113 1.2

    10000+ 88 0.9 Total 9,615

    NONRESPONSE BIAS CHECK – Collecting data on those who did not participate in the survey to investigate if they are systematically different from those who did participate. The NWOS nonresponse bias check showed respondents to the survey were statistically significantly different from nonrespondents on 2 of the 6 variables collected via telephone.

  • Page | 8

    2. The NWOS, along with many other national level studies that suffer from sample size limitations, could benefit from Small Area Estimation methods (cf. Ver Plank et al 2017). FFRC staff are considering this approach for the next NWOS; however, Bayesian approaches are not well understood by the educated lay public, and FFRC staff should take care to clearly convey results.

    3. In the 2013 NWOS, owners of acreages over 500 acres were only 17.1% of the sample, yet they control 27.6% of the private forest acreage. In fact, the distribution of the sample sizes results in margins of error of 4.0%. 3.8%, 9.2%, and 10.5% for acreages sized 500-999, 1000-4999, 5000-9999 respectively. This creates a scenario wherein information about the largest acreages of private forest (and presumably the greatest conservation impact) is known with the least amount of precision. Conversely, approximately half of the sample is from owners of acreages of 100 acres or fewer (an acreage of 700 by 700 yards or 0.156 of a square mile). Future NWOS studies should strongly consider oversampling owners of larger acreages so that estimates reflecting the largest acreage of private forest are more precise. The purpose of this oversampling is not to discount the conservation contribution of smaller acreages, but to acknowledge that the attitudes and behaviors of large acreage owners impact conservation at a larger scale.

    4. The percentage that the questionnaire needs to be answered to be considered a completed response should be established prior to data collection.

    5. The technical report for the 2013 NWOS indicates it is possible to make estimates for geographies of interest other than states (e.g., regional, sub-state, etc.) but make the caveat of “a sufficient sample size is required in any sub-state geographic area to ensure confidentiality of the respondent and that the statistics are reliable. Ideally, each geography of interest would have at least 100 respondents” (Butler et al 2016a, p37). Yet 11 states included in the 2013 NWOS have fewer than 100 respondents.

    6. People who participated in the telephone nonresponse bias check should not be reclassified as survey respondents without exceptional justification, particularly when they only answered 22% of the original questionnaire.

    7. Given the overall sample size (9,615), the CV goals likely could have been met for nearly all states, using the established budget, if data collection efforts were allocated more efficiently. This could be achieved by only collecting the stated sample size goal of 250 from states. In the 2013 NWOS, 15 states exceeded the sample size goal, with a total of 1,049 respondents in excess of 250 per state (i.e., in Vermont, 485 responses were collected; the difference between 485 and 250 is 235 responses in excess of the responses needed, nearly the response desired for another entire state). Those data collection efforts could have been reallocated to other states with lower sample sizes.

    8. All three states that had intensified data collections greatly exceeded the sample size goals (VT-485, CT-331, and TX-454).

    9. In the reports, there is not an accounting of the sample size by state. The best estimates from NWOS TableMaker are as follows:

  • Page | 9

    1-9 ac 10-19

    ac 20-49

    ac 50-99

    ac100-199

    ac200-499

    ac500-999

    ac

    1000-4999

    ac

    5000-9999

    ac 10000+

    ac TOTAL

    AL 10 13 39 31 42 51 27 51 10 9 283

    AK - - - - - - - - - - -

    AZ 2 2 7 6 1 - 1 - - 3 22

    AR 5 11 36 33 43 42 24 29 4 2 229

    CA 11 7 15 16 12 23 12 25 2 5 128

    CO - - 2 1 3 - 2 4 2 1 15

    CT 112 39 66 56 4 12 2 - - - 331

    DE 56 22 59 54 33 19 1 2 - - 246

    FL 19 21 15 11 23 24 21 21 9 2 166

    GA 15 6 26 26 34 52 35 49 5 1 249

    HI 29 6 10 7 5 3 5 5 - 1 71

    ID 1 1 6 9 7 9 3 5 - - 41

    IL 12 14 57 39 33 26 6 2 - - 189

    IN 26 34 69 59 41 20 7 2 - - 258

    IA 8 12 33 31 39 20 1 2 - - 146

    KS 8 18 34 30 24 15 4 3 - - 136

    KY 15 9 53 57 40 36 16 10 - - 236

    LA 6 10 27 18 18 31 19 19 4 3 155

    ME 13 17 41 64 53 41 19 15 4 1 268

    MD 88 40 91 66 42 41 11 8 - - 387

    MA 64 23 56 32 26 19 6 3 - - 229

    MI 22 31 71 60 54 39 7 5 4 - 293

    MN 26 27 98 79 70 60 14 3 - - 377

    MS 7 12 17 35 35 57 25 29 1 2 220

    MO 24 32 58 76 75 61 18 17 1 - 362

    MT 8 6 17 17 12 36 30 37 11 2 176

    NE 3 3 11 7 11 7 8 6 - - 56

    NV - - - - - - - - - - -

    NH 19 12 26 28 31 30 10 9 - - 165

    NJ 78 24 34 28 18 8 3 - - - 193

    NM 2 1 1 4 6 2 2 9 6 2 35

    NY 40 24 62 76 64 34 2 3 - 1 306

    NC 23 17 30 41 37 31 16 16 2 2 215

    ND 1 - 4 3 3 4 4 1 - - 20

    OH 35 4 58 56 36 13 6 1 1 - 240

    OK (East) 8 8 31 31 24 27 16 21 - 2 168

    OK (West) 4 5 20 3 16 22 10 13 4 - 97

    OR 10 7 18 12 16 19 11 22 4 6 125

    PA 36 22 58 48 36 30 10 7 3 - 250

  • Page | 10

    RI 7 6 3 2 - - - - - - 18

    SC 11 10 25 28 38 50 23 38 3 1 227

    SD 1 3 4 2 2 6 3 4 - - 25

    TN 31 19 40 46 51 57 15 9 3 1 272

    TX (East) 4 11 33 27 30 34 28 20 - 2 189

    TX (West) 6 8 15 25 25 40 32 61 20 33 265

    UT 2 2 4 1 1 13 2 6 3 1 35

    VT 45 34 68 94 105 84 29 22 3 1 485

    VA 24 19 42 48 52 38 18 19 4 1 265

    WA 16 11 18 15 6 14 11 10 - 1 102

    WV 14 15 45 47 67 47 15 11 - 2 263

    WI 30 25 94 90 83 45 14 3 - - 384

    WY 1 - - - - - - 1 - - 2

    Total 1038 703 1747 1675 1527 1392 604 658 113 88 9615

    Estimator Error In the 2013 NWOS, the statistical analyses, and the associated measures of error around the estimates, were mostly appropriate (with the stipulation that the data were collected in a correct fashion). Most analyses presented were simple overall means, grouped means, and crosstabulations. These analyses are routinely conducted with little controversy.

    One concern with the NWOS is that many times ownership-level estimates were used when an area-based estimate would be more appropriate. These two levels of estimates are conflated by the authors themselves; in the words of the authors, “a challenge with presenting results from the [NWOS] is how to clearly and precisely report different units, i.e., ownerships versus acres” (Butler et al 2012). A salient example of a discrepancy between ownership-based and area-based estimates is the question regarding having a written management plan. In the figure that follows (Figure 1, page 11), the disparities can be seen when the area-based responses are different, in some cases the difference is as large as 20% (10,000+ acres). The NWOS authors should be commended for reporting area-based estimates, such as the finding “only 1 in 4 acres of these ownerships are owned by someone who has a written forest management plan, and only 1 in 3 acres are owned by people who have received forest management advice” (USFS NRS-INF-31-15 2015, p7). However, inappropriate ownership-based estimates should be avoided; for example, the statement “only 3% of the owners have a written management plan while 16% have sought management advice” (Butler and Leatherberry 2004).

    Additionally, the authors subscribe to Steven’s level of measurement typology (1946) by treating Likert-scale style questions as categorical data (such as the 5-point scales of importance for each reason for owning wooded land). This typology of measurement is largely outdated, as most

    Using a statistical approach that is inappropriate in the context of the research introduces estimator error.

    Estimator Error

  • Page | 11

    research treats these questions as continuous data. This distinction is minor but important, because treating it as categorical data precludes any regression analysis and limits any higher-order multivariate analyses. Further, the 2013 NWOS, at times, artificially induced treatment of continuous variables as categorical data (e.g., the size of acreages was collected as continuous data and is reported as a 10-category crosstabulation). There is a significant cost to this artificial dichotomization (Cohen 1983) because it artificially reduces the variance within the data and therefore reduces the explanatory power of the data (Vaske 2008).

    A further complication that arises from dichotomization is that crosstabulations are only able to show statistical significance through the family of chi-square tests. These tests of significance are strongly influenced by sample size but are not reported in the NWOS with an effect size measure, such as Cramer’s V or φ (phi) as is the standard of the field. This is crucial because a statistical test can indicate statistical significance although there is no meaningful, practical significance. A hypothetical example of this is that one acreage could have an average DBH of 38 inches and another acreage an average DBH of 39 inches, with a statistically significant difference (p

  • Page | 12

    Observations Based on the documentation available, the following observations were made:

    1) The NWOS is mostly reported using means and crosstabulations. More information could be learned about the forest of America, and the families who own them, by using multivariate analyses.

    2) It seems outliers were identified and excluded from the analysis by using 1.5 times the Interquartile Distance, but it is unclear if this outlier analysis was conducted before or after weighting.

    3) The inclusion of effect size measures would greatly improve the understandability and interpretation of the NWOS. a) Given a large enough sample size all crosstabulations become statistically significant on

    the chi-square distribution, producing false positive findings in some cases. There is a good probability the NWOS, with a sample size of 8,000 to 10,000, has false positive statistically significant findings.

    b) Effect sizes can be estimated from the 2013 NWOS TableMaker for some findings, however, the raw data is required to calculate effect sizes exactly and confirm false positive findings, if any.

    Specification Error The questions contained on the questionnaire are reflective of the constructs they were intended to measure, with a few important exceptions.

    Most notably, the use of the term “wooded land” versus the term “forest” is problematic. The term “wooded land” was selected based on an effort to be consistent with other definitions used by the USFS and the findings of focus groups held in several states. However, the pretesting procedure, if conducted, and the process by which the focus groups were conducted was not provided with sufficient detail to be replicable. Because the focus group results so heavily influenced the decision to use the term “wooded land,” the procedure should have been more thoroughly documented. Further, the term “wooded land” was consistently used within the questionnaire, yet the term “forest” is primarily used in articles and technical reports. For example:

    “Assessing forest ownership dynamics in the United States. Methods and Challenges.” “Family forest ownerships of the United States” “America’s family forest owners.” “Understanding and reaching family forest owners: Lessons from social marketing research.” “Progress in private forest landowner estimation.” “Who owns America’s trees, woods, and forest?” [emphasis added]

    The transposition of “wooded lands” and “forest” is somewhat misleading to the casual reader and the nuances of the substitution are not discussed or analyzed in full. Measuring one attitude object,

    When the concept measured by the survey differs from the concept that should have been measured, specification error is introduced into the study.

    Specification Error

  • Page | 13

    in this case “wooded land” and then reporting on another (“forest”) is outside the norm of social science.

    As is typical with studies that span a significant period of time, there is occasionally a need to update definitions. However, if there is a change in definitions, there is limited use in comparing between time periods because it is unknown if the observed difference is attributable to the passage of time or the different definitions. This is the case in including “rangeland” in the 2013 NWOS, which artificially increases the 2013 findings by approximately 12%. The inclusion of rangeland is particularly influential

    in the 10,000 acres or more category of ownership. In the 2013 NWOS, there are only 88 owners of 10,000+ acres, and 33 of them are located in West Texas (compared to 9, 6, and 5 owners in Alabama, Oregon, and California, respectively). This places 38% of the responses for 10,000+ acreages in a geographic location that USFS timber maps indicate there are no timberlands. Therefore, the attitudes, concerns, and management practices of these landowners likely represent rangeland stewardship practices rather than forestry stewardship practices, and thus skew the results of owners of 10,000 acres or more.

    This confounding factor is captured in a recent document regarding the NWOS:

    Between 2006 and 2013, the estimated area of all family forestland (i.e., including ownerships with 1+ acres) in the United States increased by 26 million acres, but the change is largely attributable to increased acreage in western Texas because it was recently inventoried for the first time. In a comparison of only states where data are available for 2006 and 2013, which drops Alaska, Hawaii, Nevada, western Oklahoma, and western Texas, the total private forestland area increased by 1.8 million acres. Over this same time period and geography, there was a net loss of 5.1 million acres of family forestland. (Butler et al 2016b, p643)

    Therefore, the inclusion of rangeland in more recent NWOS should be better justified, particularly when the NWOS is billed to be “the official census of forest owners” and the stated aim of the NWOS is to increase “understanding of woodland owners who are the critical link between forests and society” [emphasis added]. Because the inclusion of rangeland is such a confounding variable for the 2011-2013 NWOS, there are two reports, one including all 2011-2013 data and another report containing just data that is backward compatible to the 2006 version.

    Observations Based on the documentation available, the following observations were made:

    1) Defining the term “wooded land” at the top of the questionnaire is commendable. Disambiguation of the subject so respondents are evaluating the same object is key to solid research.

    2) The authors are to be commended for making the 2013 data backward compatible to the 2006 data.

    CONFOUNDING FACTOR – A confounding factor is a variable which is related to one or more of the variables defined in a study. A confounding factor may mask an actual association or falsely demonstrate an apparent association between the study variables where no real association between them exists. If confounding factors are not measured and considered, bias may result in the conclusion of the study

  • Page | 14

    3) It is unclear why rangeland was not included in previous iterations of the NWOS, but included in the 2013 NWOS.

    4) The outcomes of the cognitive testing, if completed, were not presented. 5) At times in the documentation focus groups and pretesting are referred to as distinct research

    efforts, and at other times they are used interchangeably. It is unclear if the focus groups were conducted and then a traditional pretest was completed, or if the authors consider the information from the focus groups as a sufficient pretest. a) If a quantitative pretest was completed, the results and alterations to the questionnaire as a

    result of the pretest should be reported. b) If a quantitative pretest was not completed, that would be a major departure from the

    standard in social science research. 6) A quantitative pretest should be conducted in future NWOS studies. 7) Focus groups were held in New Mexico, Maine, North Carolina, Missouri, and Hawaii. These

    states may be geographically comprehensive; however, they are not representative of geographies with forests. For example, New Mexico only had 46 responses and Hawaii had 78 responses, yet these were two of the states where focus groups helped determine the understanding and terminology of the survey.

    8) As is common to social science research, the 2013 NWOS tried to write the questionnaire at the eighth-grade level and attempted to avoid technical jargon. a) According to Butler and colleagues (2016a, p9), “The term forest was not used in the survey

    instrument because it has a different connotation to owners and forestry professionals. Instead the phrase “wooded land” was used throughout the survey.” i) The basis for this determination was not documented. ii) It is likely the average individual would not consider an acre of land with 10 trees as a

    forest. Therefore, “wooded land” is likely used to increase response rates for people who would normally return the survey marked “Not Applicable” to them.

    b) The term “forest” is easily understood at the eighth-grade level. c) “Forest” is not technical jargon. d) According to Google Labs-Ngram viewer, a data analysis center that evaluates the

    prevalence of various terms, "forest” is 1,915 times more common than “wooded land” in the corpus of the English language.

    YEAR

    PR

    EV

    ALE

    NC

    E

    PREVALENCE OF “FOREST” AND “WOODED LAND” ACROSS TIME

  • Page | 15

    9) Because of the role the focus group process played in selecting “wooded land” as the attitude object (the concept that is being evaluated by the respondent), improved documentation is needed. Specifically: a) Information on the focus group moderator.

    i) The level of expertise of the moderator ii) Was the moderator the same in all states iii) Steps taken to ensure consistency between states iv) Steps taken to avoid interviewer/moderator bias

    b) The stratification used to ensure representation of all strata. c) The decision process of choosing the states for focus groups. d) The point at which the researchers achieved information saturation was achieved (the point

    in focus group discussion where no further information is being gleaned from participants). e) Redacted transcripts of the focus group should be made available for review because they

    influenced the verbiage used to such a high degree. 10) The focus group moderation guide should be available to be consistent with the minimum

    standards of peer review. 11) If a traditional pretest was conducted after the focus groups, as is conventional, further

    documentation is needed to answer the following: a) What type of respondent was included in the pretest? b) Were all strata of respondents represented? c) In what mode was the pretest conducted? If mail-based, how was a representative sampling

    ensured? d) How was meta-information (information about how the respondent reported information)

    recorded and analyzed? 12) The 2013 NWOS uses the term “wooded land” and defined it on the survey to achieve

    disambiguation. Future NWOS iterations should use the term “forest” and define it in like manner. a) The respondents will be evaluating the same attitude object whether it is labeled “forest” or

    “wooded land.” b) This would engage the constituents who own 57% of the private forest acreage while

    simultaneously not disenfranchising smaller forest owners. c) As constituted, using the term “wooded land” is controversial, particularly to the owners of

    the largest acreages of private forests. d) The 2013 NWOS presents an opportunity to create a split sample experiment with some

    respondents receiving “wooded land” and others receiving “forest.” Many researchers would have conducted this experiment in the pretest, but the outcomes of this experiment were not available for review.

    13) The NWOS extensively defines “wooded land” but does not define other potentially ambiguous terms such as “green certification” in question 18 of the survey. a) The NWOS provides examples of groups that certify such as Tree Farm, Green Tag, Forest

    Stewardship Council, and Sustainable Forestry Initiative, yet these groups do not use the term “green certification.”

    b) Results from this question may not reflect the true nature of the forest certifications that are truly on the landscape.

  • Page | 16

    c) Similarly, the terms “management plans,” and “timber harvesting” may be imprecise in their measurements.

    14) When the term “wooded land” is defined on item #3 of the survey, it is unclear if the acreage must meet all three components to be considered “wooded land” or if the acreage only has to meet one component. a) Even though using the term “or” is usually against questionnaire design conventions,

    including an “-OR-” between each point would clarify the researchers’ intentions. 15) Ownerships less than 10 acres (approximately two football fields by two football fields) were not

    included in the analysis because “they are primarily large backyards associated with house lots; traditional forestry practices are not applicable at this scale; most forestry professionals do not consider these forests; many of these smaller acreages do not qualify for assistance programs; and the large sampling errors associated with estimates that include small ownerships make trend analysis more difficult.” a) By that same reasoning, everything under 20 acres (284 meters square) or 50 acres (450

    meters square) could also be considered too small to be included in the analysis by some forest professionals.

    b) Acreage was collected as a continuous variable and artificially categorized during data processing, therefore the 10-acre cutoff appears arbitrary.

    c) Forest policies based on the findings derived from mostly smaller landowners is inefficient, as it takes surveying many small landowners to equate to the acreage of one landowner of large acreages.

    Coverage Error Because of the multiple sequential steps to identify the sample population, the NWOS likely suffers from coverage error. There are likely landowners who are eligible to take the survey who have no possibility of being selected to participate.

    The NWOS is dependent upon remote sensing to identify if the selected point from the hexagon has forest on it. An unknown percentage of hexagon points were manually checked to verify they were forested; however, it is unclear if the ground-truthing process quantified false negative (i.e., remote sensing indicated not forest but was determined to be forest through ground-truthing) findings as well. Error could also be introduced at this stage depending upon the age of the image layer used to conduct remote sensing via GIS analysis. The amount of error an aged image layer would introduce is minor in comparison to the remote sensing detection.

    The definition of “wooded land” may also be problematic in calculating coverage error because it includes land that formerly was wooded or will be regenerated in the future. The land that was formerly wooded, or may be wooded in the future, would have a vegetation signature of understory, which is different from tree canopy, and may not be detected in the remote sensing process.

    Coverage error is introduced into the study when viable candidates for participation are excluded. This renders the sample unrepresentative of the population in question.

    Coverage Error

  • Page | 17

    Finally, coverage error may be introduced when cross-referencing spatial data to tax records, as there may be error introduced when merging the two records. Additionally, tax records rely on self-reported information and are imperfect. Evidence of this error is reflected in the 8% non-deliverable rate. This final step of merging the tax records is likely the largest contributor to coverage error of the NWOS.

    Observations Based on the documentation available, the following observations were made:

    1) The coverage error in the NWOS is considerable and is likely one of the largest contributors to the total survey error.

    2) The manner that NWOS identifies respondents is innovative, though unconventional to the social sciences. If the NWOS is to continue, consideration should be made to separate the NWOS from the FIA program, so the NWOS can switch to a sample selection method that is less problematic. This may be difficult because of the institutional connectedness to the FIA program, as evidenced by the following quote: “We recommend conducting this reassessment [NWOS] no more frequently than every 5 years, the minimum amount of time needed to complete a full FIA cycle in any state” (Caputo et al 2017). This quote is illustrative of a possible unwillingness to consider that sociological information may be needed at different intervals than biological information.

    3) The crosswalk process from the GIS output of the remote sensing to the tax record introduces error. a) The percent of acreages that were identified as privately-owned forest that were successfully

    matched to a legitimate tax record was not reported. The number of forest acreages identified but without accurate contact information is unknown.

    b) There are errors in the tax record, which presumably led to the 8% nondeliverable rate. Those potential respondents were treated as if they were ineligible (known-ineligibles) rather than simply not contactable (nonrespondents of unknown disposition). This is incongruent with AAPOR standard practices and causes the stated compliance rate to further diverge from the response rate.

    c) Larger acreages are more likely to be trusts, LLCs, subsidiaries, or to have multiple owners and subsequently are likely more difficult to contact through the tax record (i.e., remote management or PO boxes that go unchecked). Additionally, there are likely more intermediate steps to getting the questionnaire to the person that is responsible for the decision making for the acreage. This would introduce a strategic bias for underrepresentation of respondents of larger acreages.

    d) The process for determining if a “corporation is only for tax purposes” is not well documented, and it is unclear if the process is objective. There is no estimate for interrater reliability (i.e., two researchers evaluate the corporation independently and then compare) for this determination. The criteria for inclusion or exclusion in this category is not identified and is therefore not repeatable.

    e) If the criteria for inclusion or exclusion in this category is based on self-reported information, the process is imperfect as self-reporting to a federal government agency is likely to be biased, particularly with issues relating to taxation.

  • Page | 18

    4) As there is a known point of diminishing returns from prior research, it is unclear why random selection was not stratified prior to data collection and then post hoc weighting (i.e., survey respondents who represent more of the population are considered more heavily) performed. This approach should be considered in future NWOS, particularly since states are a known domain of interest to many stakeholders.

    5) Because the 6,000-acre hexagons are adjacent to each other and are stratified to enforce a 5-year rotation, larger acreages are more likely to be included in the survey on an annual basis.

    6) The study was conducted with replacement (i.e., when a viable sample unit is drawn from the population but does not respond to the survey, researchers will often attempt to contact another sample unit that is similar to the original sample unit), but it is unclear at what stage of the study a sample unit was replaced. a) If the sample unit was replaced when the point in the hexagon did not fall into a forested

    area, the estimate of forest is too high (this is likely not the case). b) If the sample unit was replaced when a tax record could not be found, the study likely

    overestimates owners of small acreages for the reasons aforementioned in this section regarding the difficulty of contacting owners of large acreages (this is a probable scenario).

    c) If the sample unit was replaced when someone was contacted but ultimately was a nonrespondent to the survey, large acreages would be underestimated because large acreages would have a higher probability of being resampled and then subsequently discarded for already being a nonrespondent. If the hexagon was resampled until a replacement was found, then smaller acreages are likely overrepresented.

    7) The nonresponse bias check was only performed on people who provided a telephone number on the tax record. An address-to-phone match should be performed with commercially available crosswalks to reduce the coverage error in the nonresponse bias check.

    NonResponse Error The NWOS suffers from nonresponse error, the extent of which is unquantifiable because of the sparsity of methodological documentation as well as the structure of the study design. Based on the acreage of the US, there ought to be approximately 409,000 hexagons representing the potential population of forests. The study has a self-imposed 5-year rotation, meaning that, in any given year, approximately 82,000 points are examined to determine if a forest is present. The actual quantity of points examined for forest using remote sensing is not reported. It is also unknown how many points were positively identified as forest using remote sensing, as well as how many points were positively matched to an individual through the tax record. The documentation is also unclear if a survey invitation was sent to all successful matches from the tax record, or if a sample was taken. The FFRC sent 21,250 invitations, 20,092 of which were usable in the backward comparison to the 2006 NWOS. Undeliverable mail

    NonResponse error is introduced into the study when the respondents of the survey are strategically different from those who do not respond. When findings from the respondents are extrapolated to both the respondents and nonrespondents, misleading results occur.

    NonResponse Error

  • Page | 19

    accounted for 1,717 surveys, 8,650 mailed surveys were returned, 177 surveys were completed via the internet, and the telephone nonresponse bias check included 1,256 people. The nonresponse bias check asked approximately 22% of the full questionnaire. Once the comparison between mail and web respondents and telephone nonrespondents was complete, the nonresponse portion was then added to the respondents (Butler et al 2016a). Commendably, the NWOS provided an opportunity for people to opt out of the survey if they did not own any forested property; however, the quantity of opt-outs it is not reported, and it is unclear if these surveys were treated as a response, nonresponse, or an ineligible sample point.

    Observations Based on the documentation available, the following observations were made:

    1) The nonresponse bias within the NWOS cannot be ignored. a) The NWOS presents estimators assuming the absence of nonresponse bias, therefore the

    uncertainty surrounding the estimates may be wider than reported. b) The NWOS states, “With no means for fully testing nonresponse bias, mixed results (mostly

    supporting no bias) from the qualitative analysis, and no means for correcting bias, no corrections were made to the data, but these findings should be considered when interpreting the results” (Butler et al 2016a, p33). i) Future NWOS studies should budget for a more rigorous nonresponse bias test to

    ensure its findings are accurate. ii) Nonresponse bias is either present or not present in a study. During testing, two of the

    six variables were statistically different from the respondents. On one variable (presence of a written forest management plan), mail respondents were nearly twice as likely to have a forest management plan as the telephone nonrespondents. (Butler et al 2016a, p33). Stating these are mixed results is inaccurate.

    iii) At one point, the authors state, “Currently, the naïve approach to item nonresponse (i.e., not adjusting for nonresponse) is the only feasible option for the NWOS. This approach potentially biases the NWOS estimates, but to what degree is unknown, as testing for item nonresponse bias is beyond the resources of NWOS at this time” (Butler et al 2016a, p34). Stating that there are no means to correct nonresponse bias is inaccurate, as there are weighting scenarios that could have mitigated the nonresponse bias during the study design phase. Because it is not known to what degree the nonresponse potentially biases the NWOS estimates, the findings of this study may be called into question and the policies that are guided by this research may be tenuous.

    iv) Most readers are not going to read the technical report or consider the nonresponse bias in interpreting the results. Most of the educated lay population would not understand how to account for this nonresponse bias or the magnitude in which results would need to be adjusted.

    c) Most nonresponse bias tests would have included more than six questions. 2) The nonresponse bias toward more engaged landowners may have been reduced if one more

    wave of the questionnaire was sent out. 3) The specified cooperation rate of 52% (37-64% range for states) is not the same as response

    rate.

  • Page | 20

    a) Although the authors of the NWOS acknowledge the difference between cooperation and response rates in some documents, they often confound the two by referring to cooperation rates in ways that would normally be considered a response rate.

    b) NWOS should consider designing the study so that a response rate could be calculated using standardized methods of the research field (AAPOR standardized response rates) and allow peer scientists to evaluate the quality of the research.

    c) The NWOS study is less salient for small woodland owners as compared to larger forest owners. Typically, those who are more engaged in the topic (in this case owners of larger acreages), respond in earlier waves of the survey. Those less engaged (in this case owners of smaller acreages), respond in later waves of the survey. Nonrespondents are more similar to the last wave of the survey. Therefore, the nonresponse may not be random.

    d) The quantity of telephone calls made to achieve the 1,256 respondents is not documented. e) Because the tax record only was used to create the list of potential nonresponse bias check

    participants, the nonresponse bias has coverage error that is confounding the nonresponse error. Additionally, there is likely a strategic bias in the nonresponse bias check, as connecting to the decision makers of large acreages via the telephone mode was more difficult for the same reasons listed in the mail mode of data collection.

    4) Simultaneously using a telephone subsample to estimate nonresponse bias and to augment the sample is incongruent with standard research practices. a) Over 12% of the NWOS response sample comes from the telephone mode, which collected

    approximately 22% of the full questionnaire. The item nonresponse for the telephone mode of data collection is too high to be considered a response. Therefore, it is inappropriate to add the telephone nonresponse bias to the mail and web responses.

    b) Including the telephone nonresponse bias check in the cooperation rate is a mischaracterization of the percent of people who engaged in the survey in a meaningful way.

    5) Telephone nonrespondents were contacted by the USDA National Agricultural Statistical Service; however, the training they received to reduce interview bias is not documented.

    6) The geographic distribution of the nonresponse bias check is unknown. 7) There is no mention of how the NWOS engaged with non-English speaking respondents.

    a) Given that approximately 5% of the U.S. population does not speak English and a substantially larger portion prefer communication in another language, a translated questionnaire should be considered.

    b) This is important because in some cases non-English speakers are shown to differ in their conservation attitudes from their English-speaking counterparts (Chase et al 2016).

    8) When respondents of different modes of data collection are being compared, it should be presented with a statistical test and an associated effect size. The largest nonresponse error across modes of data collection in the NWOS was the question regarding having a management plan, and presenting a statistical test would make it easier to evaluate if the modes of data collection were different and to what magnitude (chi-square test with an associated Cramer’s V or φ [phi]).

  • Page | 21

    9) Presenting the estimators without addressing the nonresponse is problematic, as nonresponses differ from mail and web responses. In the words of the author:

    An assumption behind the new estimators outlined thus far is full response; that is, all ownerships contacted fill out all items (full item response) on the survey and return it (full unit response). This is far from reality, with NWOS unit response rates being around 50% (Butler et al 2005). For reasons too complex to detail here, nonresponse could be ignored in past NWOS estimation with implicit assumptions about the nature of nonresponse. (Dickinson and Butler 2013, p322)

    10) The NWOS did not complete a wave analysis. a) The Dillman method (2014) has an initial notification plus three waves of data collections

    and two postcard reminders to reduce nonresponse bias. The third mailing is to convert nonresponses into responses; however, an additional objective for the third mailing is to facilitate a wave analysis.

    b) The NWOS only conducted an initial notification, two waves of mailings, and a reminder. c) Late responders are theoretically and empirically more similar to nonrespondents than early

    responders. Therefore, information about the nonresponses can be derived from late responses to the NWOS if a wave analysis could be completed.

    11) Because states and acreages sizes are domains of interest, response rates by states and acreage size should be provided. At a minimum, the following table should be provided:

    Total Hexagons

    Hexagons With Private Forest

    Hexagons Matched to the Tax Record

    Total Mail Invites

    Mail Respond

    Web Respond

    Phone Nonres Bias Check Undeliverable

    01 AL 02 AK 04 AZ 05 AR …. US

  • Page | 22

    Measurement Error The NWOS has a variety of potential measurement errors. There are several points on the questionnaire where responses may be adjusted to socially desirable responses (e.g., the questions regarding using a management plan [#11], land use [#12], and using a professional forester [#12b]). Similarly, questions associated with compliance with laws may have some bias. For example, questions regarding income from land usage may be altered because land leases may be conducted in cash and not reported as income. The respondent may not report monies collected for fear of being reported to the Internal Revenue Service.

    Respondents may also strategically alter their answers in an effort to manipulate the outcomes of the survey. The questions pertaining to how the respondent’s wooded land was acquired, management decision-making, future land use, and the percent of income derived from wooded land may have strategic bias to manipulate the policies that may be influenced by the results of the NWOS.

    Further, questions of self-assessed knowledge are infamous in social science for introducing measurement bias. Respondents consistently over estimate their level of knowledge, regardless of topic. Questions about respondent’s familiarity with cost-share programs, green certifications, tax-based incentives, and the assignment of development rights are likely biased and overestimate the familiarity respondents have with these topics and programs.

    With a few minor exceptions, the NWOS does not have measurement error that derives from questions that are ambiguous, lengthy, confusing, or otherwise poorly constructed. Generally, the word “or” should be avoided on questionnaires, as it is problematic for a few reasons. For example, when NWOS respondents are asked about concerns for their forest and they mark “trespassing or poaching” it is unclear if trespassing or poaching is the issue at hand for that landowner. The same issue exists with the “unwanted insect or disease” option and several similar questions throughout the NWOS. Another example is seen during pretesting: questions that contain “or” signify to some respondents to interpret a question with multiple conditions as needing to meet all of those conditions. An example from the NWOS pretest is the question “Was a professional forester used to plan, mark, contract, or oversee any of the cuts? (#12b).” Many respondents would interpret the question to mean to indicate “YES” if a forester was used in any of the four conditions listed, while others would interpret this to indicate “YES” if a forester was used in all four conditions but “NO” if a forester was used in three or fewer of the conditions. Lastly, in questions with options listed, the options should be mutually exclusive and collectively exhaustive (Vaske 2008). There are a few occasions when the NWOS does not meet this standard (e.g., question #14: An owner could conceivably “cut trees for sale” with the purpose of “reducing fire hazard” or “to improve wildlife habitat”).

    Measurement error occurs when respondents deliberately or unintentionally provide responses that are not reflective of their true attitude or behavior, or are factually incorrect.

    Measurement Error

  • Page | 23

    Observations Based on the documentation available, the following observations were made:

    1) The NWOS has social desirability bias in several questions. a) The NWOS should review the questionnaire for sound psychometric properties.

    i) The phrase “before today” should be added in front of “how familiar are you with…” style questions. This allows respondents to save face regarding the fact that they may not know a topic, and it allows them to indicate that they now know about the programs but may not have in the recent past.

    ii) Add the phrase “Did you have a chance to” in front of the compliance questions. This softens the question and allows respondents to provide more accurate information.

    2) The NWOS did not test for respondent fatigue, which is a consideration for a questionnaire of this length. a) The questionnaire should be constructed with improved psychometric properties, including

    randomly shifting the order of the question blocks and potential responses. Research has shown response options listed lower in the questionnaire are consistently rated lower (cf. Vaske 2008). Randomly shifting the order negates this effect.

    3) The NWOS has areas where the cognitive load on the respondent seems high, as there are 15 pages of questions (which may lead to a reduced response rate). a) Questions appearing on the survey that have not been used in a meaningful way in the

    resulting documents may be candidates for removal. b) A quantitative cognitive review of the NWOS should be conducted to identify questions that

    significantly increase response burden. 4) The NWOS has some irregularities in how the data were handled, which likely resulted in

    measurement error. a) A quantitative psychometric analysis of the research should be conducted.

    i) The NWOS combined the mail and electronic responses into one dataset to compare to the telephone nonresponse. It is inappropriate to combine the two modes of data collection before a MultiGroup Confirmatory Factor Analysis (MGCFA) is completed. An MGCFA can determine if the difference observed between two modes of data collection is a true difference between the populations or if it is an artifact of the questionnaire being presented in two different ways (due solely to the mode of data collection). There is a wealth of literature showing the mode of data collection alters a person’s response to the same question, and many social scientists consider an MGCFA as a prerequisite for combining data collected via different modes (mail, electronic, telephone).

    ii) Future NWOS should include an MGCFA test to see if the responses are different solely due to the mode of data collection (e.g., mail vs. web).

    b) Interviewer bias could be reduced by using telephone interviewers who are not part of the research group (and the training received should be documented).

    c) Further, documented tests should be included for: i) Straight-lining or patterning (i.e., people who go straight down the line in a grid or have

    an apparent pattern to their responses) ii) Racing (i.e., people who complete the survey in an abnormally short time)

  • Page | 24

    iii) Numeric inconsistencies (e.g., a respondent aged 15 who indicated an education level of PhD, MD, JD, or other terminal degree)

    iv) Quality checks (e.g., a row in a grid asking the respondent to check a specific answer) v) Strategic biasing (i.e., including an answer option that is similar to current conditions)

    Processing Error The NWOS suffers from processing error in a few areas. The foremost processing error that is introduced is treating continuous data as categorical data (e.g., Likert-style questions are treated as categorical data). This could be considered by some to be an acceptable, though antiquated, practice (Stevens 1946). However, treating the data as categorical limits the explanative power of that variable (Vaske 2008). Further, the NWOS artificially dichotomizes or categorizes continuous variables, such as in the case of acreage size. When these data are analyzed as categorical data they have less explanatory power, that is, the ability to find useful relationships between variables. This loss of explanatory power is known as the cost of dichotomization and is well-documented in scientific literature (Cohen 1983). This approach is outdated and should be reevaluated in future revisions of the NWOS (see full discussion in the Estimator Error section).

    The other processing error that may be present in the NWOS surrounds the weighting of data. Weighting is an essential component to survey research. When sampling, sometimes the respondents to the survey differ from the population of interest. Weighting is used to ensure the sample population is representative of the entire population by making small mathematical adjustments to the data. Survey respondents who disproportionately represent more of the population are considered a little more heavily, while respondents who proportionally represent fewer in the population are considered a little less. It is unclear how the NWOS was weighted because each research document treats the topic of weighting in a slightly different manner. In approximately 2013, after constructive criticism from within the peer-review literature (Metcalf et al 2012), the NWOS altered its estimation procedures. A reader can surmise from later documents and academic journal articles that the data were weighted in some fashion (likely the Hansen-Hurwitz Estimator [HHE], Horvitz-Thompson Estimator [HTE], or similar). For NWOS-related documents after 2013, the data are weighted inversely-proportional to their chance of being selected in the 60,000-acre hexagon (referred to as probability proportional to size [PPS]). No additional weighting was provided to account for the various cooperation rates among the different states, which ranged between 37% and 64%. The crosstabulations of replies, the number of ownerships, and the estimate are not directly proportional when compared in the 2013 NWOS TableMaker; therefore it is clear the data are somehow weighted, although the method is not specified.

    Data processing error is introduced during data preparation, coding, misapplication of weights, or misreporting.

    Processing Error

  • Page | 25

    Observations Based on the documentation available, the following observations were made:

    1) The methods used to weight a dataset of this size need to be clearly stated so end-users and constituents can understand the repercussions.

    2) The NWOS weighting strategies are somewhat outdated: The Hansen and Hurwitz Estimator (1943) and the Horvitz-Thompson Estimator (1952) were developed during a time of limited access to computing. There are other estimators that may be an improvement upon these models.

    3) The FFRC has provided different, often conflicting, narratives regarding the weighting process, at one point verbally stating there were no weights included in the statistical analysis. To date, FFRC has not provided their values for weighting the data.

    4) There is little evidence the NWOS uses different weightings for estimates provided at the national level and the state level. Most studies would use different weightings when providing estimates for different geographies of interest. This is important because within each domain of interest (in this case a state) there may be varying proportions of sample to population, necessitating a different weighting schema.

    5) It is unclear how the NWOS adjusted the weightings in the states with intensified sampling, as it is not documented in technical documents or the statistical syntax (the written log of all data manipulation and statistical analysis.

    6) The NWOS claims “ownerships are selected with probability proportional to size (PPS) because points on the ground have equal probabilities of selection.” However, after replicating the methods of the NWOS as described in the technical documents (i.e., beginning at the center of the hexagon, randomly choosing an azimuth, 0-359.999, and randomly selecting a distance, 0-4989.999 feet), it is clear there is a biased selection for acreages toward the center of the hexagon. Below is a graph showing the results of a simulation of 10,000 sampling points using the methods of the NWOS. A kernel density estimator clearly indicates a bias for acreages toward the center of the hexagon.

  • Page | 26

    7) In some documents, trimming of statistical weights was addressed. In other documents, it was not performed or was performed but not discussed.

    8) All versions of the NWOS should consider using a literate statistical programming approach, which is written to conform to human logic rather than the needs of statistical coding.

    9) It appears that much of the data processing is logged, but the NWOS should review its statistical analysis to eliminate manual data manipulations for the purpose of increasing replicability of the study and easing peer review.

    General Commentary There are several areas where the NWOS could be improved but that do not fit specifically into one type of error under the Total Survey Error paradigm. The following list constitutes some further areas where the NWOS can improve.

    1) The NWOS lacks transparency, one of the hallmarks of quality science. a) The public good that is derived from sharing public data must offset the potential risk posed

    to current study participants and the ability to procure future responses. Future versions of the NWOS should consider creating an anonymized data file to facilitate sharing with peer scientists. There is scientific and legal precedent within other federal agencies who have shared anonymized data of similar sensitivity.

    2) Expending efforts in states with very few family-owned forests, knowing beforehand that the response would not likely be large enough to produce state level estimates, is not fruitful. A more advantageous use of those resources would be to acknowledge the sample threshold would not be met and then reallocate those survey resources. Another approach would be to increase the sampling intensities in those states with traditionally low family-owned forests.

    3) The NWOS has an ambiguous sample unit. At times it seems as though the acreage is the sample unit, at other times it seems as though the owner of the acreage is the sample unit. The distinction is minor but important because the results can change depending upon which owner (which spouse or partner) is answering for the acreages.

    4) It is not clear how the NWOS randomized among the spouses and partners as to who should take the survey.

    5) The quality assurance of the NWOS may be inadequate. In 2011 to 2013, the NWOS quality assurance manually examined only 310 surveys (37 questions, 283 fields) and found 186 errors. There is a certain amount of error that is anticipated with automated coding, but this rate is higher than expected.

    6) Poor questionnaire design can lead to greater response burden, larger item nonresponse, higher dropout rates, and responses that are considered lightly (known as low elaboration). While the NWOS is well-designed for the most part, there are a few small changes that could result in an improved questionnaire (see Vaske 2008 and Dillman et al 2014 for details): a) Tracking numbers or barcodes reduce response rates so they should be obfuscated from the

    respondent. b) Adding more white space helps a respondent to engage with a survey. c) Pictures on the cover page have better results as compared to drawings. d) Surveys should begin with engaging questions. e) All question stems should have congruent verbs to ease reading.

  • Page | 27

    f) All unfamiliar terms should be defined (e.g., “certified green”). g) Construction conventions for demographics should be followed (i.e., choose to ask for

    gender or sex but do not mix the two; list the races in the order which most reduces response burden rather than alphabetical order).

    h) The survey is reported to have taken 25 minutes to complete. This is within the recommended time frames for a mailed survey (Vaske 2008); however, shorter surveys result in higher response rates.

    7) The NWOS does not seem to consider “core forests” in their sampling design. Core forest is defined as a forest portion that is at least 120 meters from any forest/non-forest boundary. Even under ideal conditions (the acreage is square) an acreages of 20-acres would have under half an acre of core forest. In comparison, an acreage of 1,000-acres would have 780 acres of core forest. This is not meant to diminish the contributions of smaller forests, but clearly there are more social, economic, and environmental benefits associated with larger, contiguous forests. This disparate value is not considered in the sampling design. a) Acreages in the 1-10 acre category have no core forest. b) The following is the relationship between the size of acreages and the percent of core forest

    each proffers.

    0%10%20%30%40%50%60%70%80%90%

    100%

    0 2000 4000 6000 8000 10000

    Perc

    ent o

    f For

    est t

    hat i

    s Cor

    e Fo

    rest

    Size of acreage (acres)

    Percent of Core Forest by Size of Acreage

  • Page | 28

    Final Recommendations

    The following are recommendations aimed at improving the scientific rigor and accessibility of the NWOS. They are not presented in any specific order.

    1. A recurrent theme of this critique is that the methodology of the NWOS is sparsely documented. Within peer reviewed journal articles there is a certain amount of latitude that is given to the author to exclude certain details so journal space is not wasted. However, technical documents and reports should detail this information fully. When this information is not included, peer review of the NWOS is somewhat difficult.

    2. Large research projects that involve a great deal of resources are frequently overseen by a committee of diverse stakeholders that direct, give input, and advise the research process. The NWOS operates without such an oversight committee, which reduces the chance that constituents can provide input into the research effort.

    3. Comparisons of the NWOS across time should be avoided due to the differences in geographies included, modes of data collection, response rates, Likert scales, and study designs. Even if these are accounted for, there are too many confounding factors to state that the changes observed in the data are due solely to the passage of time.

    4. The NWOS would benefit greatly from having a quantitative survey methodologist and a psychometrician involved in the design and implementation of future studies.

    5. Other large research projects (e.g., NOAA and USFWS) have recently been privatized to take advantage of the efficiencies gained by research conducted by the private sector, such a move may be considered by the NWOS as well.

    6. If a TableMaker-style product is to be replicated in 2018, the option to analyze the data according to ownerships or according to acreage should be more explicit.

    a. If descriptive statistics are going to be provided on the TableMaker, frequentist statistics and effect sizes ought to be simultaneously provided and explained so the reader can understand the nature of the differences shown.

    b. The NWOS TableMaker is very useful to constituents. However, only general crosstabulations can be done with this product. The authors clearly have a mastery of the R statistics program and utilizing the SHINY package from R statistics would be a great improvement over the TableMaker application. This interface would magnify the usefulness of the NWOS to stakeholders.

    7. The term “wooded land” is contentious, particularly to the owners of the largest acreages of private forests.

    a. Using the term “forest” and defining the term in the same manne