09 cdl, classification accuracy v. training data error
TRANSCRIPT
![Page 1: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/1.jpg)
Cropland Classification Accuracy as a Function of Training Data Accuracy
United States Department of AgricultureNational Agricultural Statistics ServiceResearch and Development DivisionSpatial Analysis Research Section
David M. JohnsonGeographer
Association of American Geographers, 2010 Annual Meeting
![Page 2: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/2.jpg)
Study Overview• Want to understand how potential errors in training data
impact decision-tree based land cover classification– Especially tailored to mapping efforts within NASS
– Primarily in regions dominated by common commodity crops
• Hypothesis : Classification accuracy decreases as training data accuracy decreases– By how much?
– Is there a threshold?
– What’s the relationship?
– Is it linear?
– Are there scenarios where it improves the outcome?
• Chose 3 states to test these questions– Iowa
– Idaho
– North Dakota
![Page 3: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/3.jpg)
Operational land cover mapping within NASS
![Page 4: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/4.jpg)
Classification Methodology Overview1) “Stack” AWiFS, TM, MODIS, and ancillary data layers within a raster GIS
• 56 m grid cells, Albers Conic Equal Area projection, common extent by state
• some compromised imagery (from clouds, haze, data gaps, etc.) is acceptable
2) Sample spatially from stack within known ground truth from FSA (ag. categories) and NLCD (non-ag. categories)• a heavy sample rate (100s of thousands) at the pixel level is employed
3) “Data-mine” samples using Boosted Classification Tree Analysis to derive best fitting decision rules • implemented with Rulequest See5.0, interfaced with ERDAS Imagine with the “NLCD Mapping Tool”
4) Create land cover map by applying derived decision rules back to input data stack
Rulequest See5.0
Output “Cropland Data Layer”
Derives decision tree-based classification rules
Generated rule set
Agriculturalground truth
(via the USDA Farm Service
Agency)
Non-agriculturalground truth(using the National Land Cover Dataset as a proxy)
Imagery stack(independent data)
(dependent data)
Manages and visualizes datasets
![Page 5: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/5.jpg)
Example Classification Subset
CDL Classification(red = sugar beets, brown = soybeans
tan = spring wheat, gold = corn,yellow = sunflowers)
Resourcesat-1 AWiFS, 6 July 2007(red =SWIR band, green=NIR band, blue=red band)
![Page 6: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/6.jpg)
Accuracy Assessment
Each classification tested against independent set of ground truth datato determine overall and within class accuracies
Example classification subset Example validation subset
![Page 7: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/7.jpg)
Degradation methodology
Original sample filewith no known errors(dozens of columns,
hundreds of thousandsof rows in reality)
Rulequest See5.0
Rulequest See5.0
Rulequest See5.0
Rulequest See5.0
Altered sample files with X’th row scrambled
Column with land covercategory value
Output land cover mapRun classifierEvery
row
Everyother
row
Everythirdrow
Etc. Everyforth
row
![Page 8: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/8.jpg)
2009 Iowa Cropland Data Layer
![Page 9: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/9.jpg)
Iowa ‘09 CDL input layer examples
Scenes of data actually used: 10 AWiFS, 10 TM, 2 MODIS NDVI, DEM, Canopy, and Impervious(dates ranged from 1 April ‘09 – 8 August ‘09)
AWiFS AWiFS AWiFS AWiFS
TM TM TM TM
MODIS DEM Canopy Impervious
![Page 10: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/10.jpg)
Iowa classifications with training data error %
0.0% 69.6% 34.8%
23.2% 17.4% 13.9%
9.9% 7.0% 3.5%
Total scene has 46,474,682 pixels, 755,116 (1.6%) chosen for training
gold = corn, dark green = soybeans
![Page 11: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/11.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
acc
ura
cy
Training data error
Iowa '09 CDL, Classification accuracy v. training data error
Crop classes only
![Page 12: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/12.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
Kap
pa
Training data error
Iowa '09 CDL, Classification Kappa v. training data error
Crop classes only
![Page 13: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/13.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
pro
du
cer'
s ac
cura
cy
Training data error
Iowa '09 CDL, Classification producer's accuracy v. training data error
Corn
Soybeans
![Page 14: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/14.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
use
r's
accu
racy
Training data error
Iowa '09 CDL, Classification user's accuracy v. training data error
Corn
Soybeans
![Page 15: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/15.jpg)
-20%
-15%
-10%
-5%
0%
5%
10%
15%
20%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
bia
s es
tim
ate
Training data error
Iowa '09 CDL, Classification bias v. training data error
Corn
Soybeans
![Page 16: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/16.jpg)
2009 Idaho Cropland Data Layer
![Page 17: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/17.jpg)
Idaho ‘09 CDL input layer examples
Scenes of data actually used: 15 AWiFS, 7 MODIS NDVI, DEM, Canopy, and Impervious(dates ranged from 29 September ‘08 – 1 September ‘09)
AWiFS AWiFS AWiFS AWiFS
MODIS DEM Canopy Impervious
AWiFS AWiFS AWiFS AWiFS
![Page 18: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/18.jpg)
Idaho classifications with training data error %
0.0% 82.5% 41.2%
27.5% 20.6% 16.5%
11.8% 8.2% 3.3%
Total scene has 69,018,509 pixels, 891,793 (1.3%) chosen for training
![Page 19: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/19.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
acc
ura
cy
Training data error
Idaho '09 CDL, Classification accuracy v. training data error
Crop classes only
![Page 20: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/20.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
Kap
pa
Training data error
Idaho '09 CDL, Classification Kappa v. training data error
Crop classes only
![Page 21: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/21.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
pro
du
cer'
s ac
cura
cy
Training data error
Idaho '09 CDL, Classification producer's accuracy v. training data error
Alfalfa
Winter wheat
Spring wheat
Barley
Potatoes
Idle
Corn
![Page 22: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/22.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
use
r's
accu
racy
Training data error
Idaho '09 CDL, Classification user's accuracy v. training data error
Alfalfa
Winter wheat
Spring wheat
Barley
Potatoes
Idle
Corn
![Page 23: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/23.jpg)
-20%
-15%
-10%
-5%
0%
5%
10%
15%
20%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
bia
s es
tim
ate
Training data error
Idaho '09 CDL, Classification bias v. training data error
Alfalfa
Winter wheat
Spring wheat
Barley
Potatoes
Idle
Corn
![Page 24: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/24.jpg)
2009 North Dakota Cropland Data Layer
![Page 25: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/25.jpg)
North Dakota ‘09 CDL input layer examples
Scenes of data actually used: 14 AWiFS, 13 TM, 1 MODIS NDVI, DEM, Canopy, and Impervious(dates ranged from 6 May ‘09 – 17 September ‘09)
AWiFS AWiFS AWiFS AWiFS
TM TM TM TM
MODIS DEM Canopy Impervious
![Page 26: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/26.jpg)
North Dakota classifications with training data error %
0.0% 89.1% 44.5%
29.6% 22.3% 17.8%
12.7% 8.9% 4.5%
Total scene has 58,388,946 pixels, 737,633 (1.3%) chosen for training
![Page 27: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/27.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
acc
ura
cy
Training data error
North Dakota '09 CDL, Classification accuracy v. training data error
Crop classes only
![Page 28: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/28.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
Kap
pa
Training data error
North Dakota '09 CDL, Classification Kappa v. training data error
Crop classes only
![Page 29: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/29.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
pro
du
cer'
s ac
cura
cy
Training data error
North Dakota '09 CDL, Classification producer's accuracy v. training data error
Spring wheat
Soybeans
Corn
Durum wheat
Canola
Sunflowers
Dry Beans
Barley
Winter wheat
Peas
![Page 30: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/30.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
use
r's
accu
racy
Training data error
North Dakota '09 CDL, Classification user's accuracy v. training data error
Spring wheat
Soybeans
Corn
Durum wheat
Canola
Sunflowers
Dry Beans
Barley
Winter wheat
Peas
![Page 31: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/31.jpg)
-20%
-15%
-10%
-5%
0%
5%
10%
15%
20%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
bia
s es
tim
ate
Training data error
North Dakota '09 CDL, Classification bias v. training data error
Spring wheat
Soybeans
Corn
Durum wheat
Canola
Sunflowers
Dry Beans
Barley
Winter wheat
Peas
![Page 32: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/32.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
acc
ura
cy
Training data error
'09 CDL, Cropland classification accuracy v. training data error
Iowa
Idaho
North Dakota
![Page 33: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/33.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
Kap
pa
Training data error
'09 CDL, Cropland classification Kappa v. training data error
Iowa
Idaho
North Dakota
![Page 34: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/34.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Co
rn c
lass
ific
atio
n a
ccu
racy
Training data error
'09 CDL, Corn classification accuracy v. training data error
Iowa User's
Iowa Producer's
Idaho User's
Idaho Producer's
North Dakota User's
North Dakota Producer's
![Page 35: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/35.jpg)
-20%
-15%
-10%
-5%
0%
5%
10%
15%
20%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Cla
ssif
icat
ion
bia
s es
tim
ate
Training data error
'09 CDL, Corn classification bias v. training data error
Iowa
Idaho
North Dakota
![Page 36: 09 CDL, Classification accuracy v. training data error](https://reader031.vdocument.in/reader031/viewer/2022022416/5868d47b1a28abcd408c1a55/html5/thumbnails/36.jpg)
Conclusions
• Degradation of training data…..
– degrades the classification.
– has relatively modest impacts on the classification until more than roughly 25% of training data is in error (then it falls rapidly, and thus is not linear).
– hurts the classification more when lots of classes are present.
– never improves a classification.
– impacts differently the areal bias of categories within the classification.