nairobi 1-2 october 2007 1 some approaches to agricultural statistics [email protected] notes 1....
TRANSCRIPT
Nairobi 1-2 October 2007 1
Some Approaches to Agricultural StatisticsSome Approaches to Agricultural Statistics
Nairobi 1-2 October 2007 2
Main approaches to agricultural statistics (1)
Expert subjective estimations• In each administrative unit, a local expert fills a form with his
assessmentCensus • Farm census • General population census (households) List frame surveys• Statistical sampling
– Small administrative units can be used as first sampling stage • Selected farms (“purposive sampling”)
Nairobi 1-2 October 2007 3
Main approaches to agricultural statistics (2)
Area frame sampling • Observations on the ground
– Crop area– Yield
• Expert eye estimations• Objective measurements
• Remote sensed observations– Photo-interpretation– Classified images– Vegetation or yield indicators
Agro-meteorological models
Nairobi 1-2 October 2007 4
Expert subjective estimates
Advantages• Cheap if there is a network of agricultural experts• No sampling error• Easy to manage• All items can be addressed
– Crop area and yield– Livestock– Means of production
Disadvantages• No idea of the accuracy of data• Difficult to control the quality (non-sampling error), unless a sampling survey is
made• Changes are often underestimatedGenerally not recommended, • but in some situations it can be the only alternative. Can be used as covariables to improve the accuracy of a sample survey • Regression estimator or similar
Nairobi 1-2 October 2007 5
Farm census
Advantages• No sampling error• Detailed information on the farm structureDisadvantages• Expensive: In general it can be made only every 10 years.
– Heavy to manage in many countries. – Items that change every year are not included (e.g: crop area and yield)
• Only farms above a size threshold are included: bias (sometimes >10% of systematic underestimate for a recent census).
• Possible additional bias if farmers think that data can be used for tax purposes.Can be used as list frame for sample surveys
Nairobi 1-2 October 2007 6
Population census (agricultural holdings)
Subset of the population census: holdings with some agricultural activity. Compared to farm census:Advantages• Agricultural and non-agricultural activity (income) can be analysed together• Smaller bias (part-time farming included)Disadvantages• Farm structure is more difficult to analyse• More problematic to use as list frame for sampling surveys
Nairobi 1-2 October 2007 7
List frame surveys from census
A statistical sample is selected in the census (farms or households) Advantages• Flexible: general or specialized surveys possible. • Stratification can be very efficient if farms have heterogeneous sizes or they are
specialised in different productions. • Often smaller bias than census (quality control is easier)Disadvantages• Bias can be important if:
– Census is incomplete or not updated– Answers of farmers are not fully reliable
Nairobi 1-2 October 2007 8
Two-stage list frame if census is unavailable
A statistical sample of (small) administrative units is selectedA “mini-census” is made in each of the selected administrative unitsA statistical sample is selected in each of the mini-censuses Advantages compared to list frame survey on a census• Easier to update the sampling frame (smaller bias)Disadvantages• Less flexible than sampling on a proper census• Less efficient stratification
Nairobi 1-2 October 2007 9
Purposive sampling of farms
A set of farms is selected without a proper statistical method. Advantages• Provides an emergency solution if a proper statistical method is not applicable
– Crisis situations• Avoids high rates of non-response
– Data difficult to provide or sensitive (accountancy)Disadvantages• Requires a good knowledge of covariables in the population for extrapolation• Very high risk of biasMay produce acceptable results on the inter-annual change rates.
Nairobi 1-2 October 2007 10
Area Frame Sampling
The sampling frame is not a list of farms, but the geographic space divided into sampling units:
• Segments: portions of territory, generally 9 ha – 400 ha. – Physical boundaries: segments are delimited by roads, rivers or field limits– Geometric shape: squares….
• Points: in practice a “point” is conceived as a piece of land (3 x 3 m)• Transects: straight lines of a given length
– Often used for environmental observations. Observation mode: • Direct on the ground: crop, yield estimation…. • Interview with the farmers who manage the selected fields Sampling techniques: • Random or systematic• Clustered or unclustered• Stratified or non-stratified, • ……
Nairobi 1-2 October 2007 11
Area Frame Sampling (2)
Advantages • The sampling frame coincides quite precisely with the population
– No (few) missing elements in the frame– No repeated elements
• Sampling units easy to define – Except for segments with physical boundaries
• Objective (if direct observations)• Can be combined with remote sensing for further improvementDrawbacks • For direct observation, the date of the field visit can be critical. Problems appear if
– Crop not yet emerged– Already harvested and insufficient traces left– Not clear if it will be harvested
• Locating the units (segments or points) requires reliable field survey material– Aerial photographs with a proper enlargement – GPS
• Daily access to a reliable power supply• Technical ability to operate the device• Sometimes limitations due to security-military concerns
Nairobi 1-2 October 2007 12
Yield observations on the ground (1)
Expert eye estimates on a statistical sample • Possibly with the help of a table: number of ears per m2, number of
grains per ear, size of the grains…Advantages • Cheap• Possibility of providing geo-referenced data to combine with satellite
images• Good results if the experts are reliableDrawbacks • Difficult to assess accuracy.• Possible strong bias. (interest to have higher/lower estimates:
aids..)Need of quality control
Nairobi 1-2 October 2007 13
Yield observations on the ground (2)
Objective measurements. • The crop is cut in a square of a given size (e.g.: 1 m2)• Precise weighing in laboratory.Advantages • Statistical error can be computed• Possibility of providing geo-referenced data to combine with satellite
images• Good results if the enumerators are meticulousDrawbacks • Difficult to be precise in the application of the rules of sample
collection.– Enumerators tend to avoid parts of the field with lower yield
• Possible bias (over-estimation), even applying coefficients for harvest loss.
Nairobi 1-2 October 2007 14
Remote sensed observations (satellite images)
Can provide information on area or yield• As covariable: combined with a consistent ground survey For area estimation• Should not be used to substitute ground survey, except in
particular cases: – Conflict (dangerous to go to the field)– No authorization (illegal crops, North Korea,…)– Very high accuracy in the identification of crops (>95%).
• E.g: large fields of rice
For yield estimation • Vegetation indexes in arid regions give good indications on inter-annual
change• Co-variable to be combined with geo-referenced measurements
Nairobi 1-2 October 2007 15
Remote sensing: cost-efficiency assessment
The accuracy of a (normal) ground survey + remote sensing = accuracy of a more intense ground survey.
• Which option is cheaper?• Elements needed to assess:
– Cost structure of the of the ground survey• Fixed cost• Cost per additional Primary Sampling Unit (Administrative unit?) in the
sample• Cost per additional Secondary Sampling Units (farms, segments, points,
fields)
– Cost of remote sensing• Images• Image processing• Combining remote sensing with ground data
Nairobi 1-2 October 2007 16
Agro-meteorological models
Require a relatively complex information • Soil map (water capacity)• Phenological calendar (planting, flowering..)• Coefficients describing the physiology of the plant. • “Clean” meteorological data in nearly-real-time (10 days..)• But simplified versions are also possible. For yield estimates, results need to be combined with historical
statistical data • Historical results of the agro-met model needed (long process)Inter-annual yield change indicators can be good without historical
statistical data• Geographical analysis of areas of concern• Possibility to combine with coarse resolution satellite images (vegetation
indexes….)
Nairobi 1-2 October 2007 17
Some vocabulary
Bias and standard errorBias ~ non-sampling error• Systematic tendency • No reduction with a higher amount of data
– Cannot be removed with an exhaustive census, classified image of the whole territory, etc
• Usually difficult to compute – In general no formula available
Standard error ~ sampling error• Due to the randomness of the sample • Decreases when the sample grows• In general formulas are available
– Sometimes very complicated: simulation possible (bootstrap)
Nairobi 1-2 October 2007 18
Variables and co-variables
Use of these terms in the context of this presentation (and often in sampling survey techniques)
We have a targeted result (e.g. Crop Area)Variable (or main variable): usually refers to a magnitude that (nearly) coincides
conceptually with the targeted result (direct observation)• measured on a sample of units
– farms,– households, – small administrative units, – territorial segments, – points, – fields
• Measurements nearly unbiasedCo-variable: usually refers to more biased measurements known for the whole
population or a very large sample• Subjective estimates• Classified images• Vegetation indexes.
Nairobi 1-2 October 2007 19
Variables and co-variables (2)
Variables and co-variables can be combined • Regression estimators and similar (difference, ratio…)• Calibration estimators• Small area estimatorsIf the main variable is (nearly) unbiased, the combined estimator is (nearly) unbiased• Even if the covariable is biased• If the variable and the co-variable are well correlated, the combined estimator has a
smaller standard error– But the gain is limited– It is important that the estimator based on the main variable has a decent standard
error – Good ground or farm survey.– Quality control
• When combining a variable (known for a sample) and a covariable (exhaustive knowledge), it is important that the covariable has the same quality in the sample and out of the sample
– Do not improve the co-variable in the sample if you cannot improve it out of the sample.