example process chain - openforistoolkit wiki

5
OpenForisToolkit Wiki User name: Password: (new user ) Login Front Page News People Recent Changes Example process chain Info Search: Here is an example of the use of provided toolset, e.g. for LandCover classification. Training data collection Motivation: There are two main categories of image classification methodologies: unsupervised and supervised. Unsupervised classification creates natural groupings in the image values, called spectral clusters or classes. In this fashion, values with similar grey levels are assumed to belong to the same cover type. The analyst must then determine the identity of these spectral clusters. Supervised classification relies on the a priori knowledge of the location and identity of land cover types that are in the image (training data). This can be achieved through field work, study of aerial photographs or other independent sources of information ( http://www.ccrs.nrcan.gc.ca/glossary/index_e.php ). 1. Use of field data Existing field data may be used for land cover classification. There are, however, several points to be considered before employing the field plots: -Are there differences between image date and field data collection date? -Is the sampling design suitable (e.g. dense enough, covering all required classes)? -Does the sample plot size correspond to the pixel size? -Do the attributes collected in the field allow derivation of the aspired land cover classes? 2. Use of remotely sensed data Concerning independent remotely sensed data, Google Earth is one possibility for collecting the training data, especially in areas with up-to-date, high resolution imagery. 1.1. Training data collection tool There is a tool to be used with Google Earth (TZ_TD_Collector.swf) tailored for the Naforma LULC classification task. It starts from elementary land cover levels and goes into details within each land cover class. The forest cover % is also recorded, as well as the type and date of imagery available at Google Earth of each point. Pre-defined locations (100 x 100 m squares) are used as training areas. The idea for using the tool is that the output is coherent, i.e. all required information is collected at each point and there is no variation in the class names etc. Old or coarse resolution imagery areas can be left uninterpreted or these can be screened out later based on the recorded attributes. It is recommendable to check the locations over the imagery employed in the actual classification, as there may be clouds. There is a tool PointsToSquares.py which creates a kml file containing the 100 x 100 m training squares from a list of coordinates provided by the user. The sampling design (systematic, stratified) can be selected by the user. Within the training squares, there are 25 systematically laid plots, which can be used to assess the crown coverage. The script gengrid.bash can be used to generate a systematic sample of user-determined distances in X and Y directions. The produced list of coordinates can be used as input in the kml-generating program PointsToSquares.py . The interpreted training areas are saved into xml-files, which can then be converted into shapefiles using scripts GExml2csv.bash and CsvToPolygon.py . Example process chain

Upload: agonas

Post on 19-Jan-2016

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Example Process Chain - OpenForisToolkit Wiki

OpenForisToolkitWiki

User name:

Password:

(new user) Login

Front Page News People Recent Changes

Example process chainInfo

Search:

Here is an example of the use of provided toolset, e.g. for LandCover classification.

Training data collection

Motivation: There are two main categories of image classification methodologies: unsupervised andsupervised. Unsupervised classification creates natural groupings in the image values, called spectralclusters or classes. In this fashion, values with similar grey levels are assumed to belong to the samecover type. The analyst must then determine the identity of these spectral clusters. Supervisedclassification relies on the a priori knowledge of the location and identity of land cover types thatare in the image (training data). This can be achieved through field work, study of aerialphotographs or other independent sources of information (http://www.ccrs.nrcan.gc.ca/glossary/index_e.php).

1. Use of field data

Existing field data may be used for land cover classification. There are, however, several points tobe considered before employing the field plots: -Are there differences between image date and field data collection date? -Is the sampling design suitable (e.g. dense enough, covering all required classes)? -Does the sample plot size correspond to the pixel size? -Do the attributes collected in the field allow derivation of the aspired land cover classes?

2. Use of remotely sensed data

Concerning independent remotely sensed data, Google Earth is one possibility for collecting thetraining data, especially in areas with up-to-date, high resolution imagery.

1.1. Training data collection tool

There is a tool to be used with Google Earth (TZ_TD_Collector.swf) tailored for the Naforma LULCclassification task. It starts from elementary land cover levels and goes into details within each landcover class. The forest cover % is also recorded, as well as the type and date of imagery availableat Google Earth of each point. Pre-defined locations (100 x 100 m squares) are used as trainingareas. The idea for using the tool is that the output is coherent, i.e. all required information iscollected at each point and there is no variation in the class names etc. Old or coarse resolutionimagery areas can be left uninterpreted or these can be screened out later based on the recordedattributes. It is recommendable to check the locations over the imagery employed in the actualclassification, as there may be clouds.

There is a tool PointsToSquares.py which creates a kml file containing the 100 x 100 m trainingsquares from a list of coordinates provided by the user. The sampling design (systematic, stratified)can be selected by the user. Within the training squares, there are 25 systematically laid plots,which can be used to assess the crown coverage. The script gengrid.bash can be used to generatea systematic sample of user-determined distances in X and Y directions. The produced list ofcoordinates can be used as input in the kml-generating program PointsToSquares.py.

The interpreted training areas are saved into xml-files, which can then be converted into shapefilesusing scripts GExml2csv.bash and CsvToPolygon.py.

Example process chain

Page 2: Example Process Chain - OpenForisToolkit Wiki

1.2. Subjectively selected training areas

Training areas may also be selected subjectively and digitized using Google Earths built-in facilities.There are 2 scripts genericGEkml2csv.bash and genericCsvToPolygon.py which are meant forconverting the produced separate kml files into one shapefile. When selecting the training areas subjectively, extra care must be taken in order not to avoid areasdifficult to interpret and to the spatial and informational coverage of the training sites.

3. Use of other auxiliary data

Existing GIS data may also be used, especially to support image interpretation. In case of good, butslightly outdated earlier map, the training data could be sampled from the map, outliers (i.e. changedareas) removed from the sample and a new classification run.

Image pre-processing

1. Geometric correction/image reprojection

The generic program gdalwarp can be used for these purposes. It is an image mosaicing, reprojectionand warping utility. The program can reproject to any supported projection, and can also apply GCPsstored with the image ( http://www.gdal.org/gdalwarp.html)

2. Atmospheric correction to directional surface reflectance

Directional surface reflectance represents the value that would be measured by an ideal sensorheld at the same sun-view geometry and located just above the Earth’s surface if there was noatmosphere ( http://ledaps.nascom.nasa.gov/docs/pdf/SR_productdescript_dec06.pdf)

Motivation: -Removes atmospheric distortions (caused by emission and absorption) within one image -Improves the correlation between the ground truth and the pixel values -Allows use of common training data over several scenes -Allows the mosaicking of scenes, use of time series etc. (-Is a prerequisite for methods, which require images calibrated to ground reflectance)

Landsat Imagery from USGS are treated using programs created in the LEDAPS project by NASA http://ledaps.nascom.nasa.gov/ The program package performs atmospheric correction and produces surface reflectance values,(script ledapsscript.bash can be used, however the program source codes must be obtained fromNASA and compiled in the user's system. Auxiliary data are also needed from the same source).

The output is a set of HDF files: one containing the corrected bands and some extra layers(atmospheric opacity and a Quality Assurance), one containing a cloud and snow mask and onecontaining the thermal band.

3. Correction of bidirectional reflectance effects

Correction of surface bidirectional effects in remotely sensed images, where both sun and viewingangles are varying.

The reflectance from a target on the ground varies with both incidence angle and exitance angle(the angle between the surface normal and the view vector). The variation is caused by bothtopography and the anisotropic nature of how light reflects from the Earth’s surface due to differentland cover types. The light reflected from the surface is therefore highly dependent on the incidenceand exitance angles, and the phase angle between the incidence and exitance vectors. Onecombination of incidence, exitance and phase angles results in one possible, bi-directional,reflectance value. An infinite number of combinations results in the bi-directional reflectancedistribution function (BRDF) for a surface. ( http://www.scribd.com/doc/37526174/15arspc-Submission-23)

Page 3: Example Process Chain - OpenForisToolkit Wiki

Motivation: -Improves the correlation between the ground truth and the pixel values -Allows use of common training data over several scenes -Allows the mosaicking of scenes, use of time series etc.

For this, a tool is under development.

4. Creating image stacks

Motivation: -Some programs require image composites rather than separate bands

The HDF data created in the atmospheric correction process (with ledapsscript.bash) can beconverted into Erdas Imagine .img composite using script stack.bash.

In case a composite is needed directly from TIFF band layers provided by USGS, the generic programgdal_merge.py can be used: gdal_merge.py -of HFA -o output_filename.img -separate b1.tif b2.tif b3.tif b4.tif b5.tif b6.tif b7.tif

5. Masking of clouds/cloud shadows and Landsat 7 gaps

Motivation: -Remove cloud/shadow/gap pixels from training data before classification, in case an algorithm isused which is sensitive to outliers. -Obtain a wall-to-wall classification without blank areas. The cloud/shadow/gap pixels are filledbased on the mask, using stand alone progam gdal_gapfill. See point 6. below.

Some options: -unsupervised classification: for this the script cluster.bash can be used -going for L7 gaps only: for this the script mask_single_image.bash can be used -LEDAPS project gap/cloud/shadow-mask: for this no script is yet available for single image, but forthe gapfilling (gdal_gapfill) purposes a masking script trim_mask.bash is provided which extracts thebad data in 2 images of the same scene and detects the combined effective area of the images. -manual digitizing: for this, a script is to be provided soon, which combines the digitized AOI layers,inserts the gaps from the anchor and filler images and detects the combined effective area of theimages.

6. Filling of cloud/shadow/gap areas

Motivation: -Obtain a wall-to-wall classification without blank areas.

The cloud/shadow/gap areas in the target image can be filled using another image, preferably fromthe same season and not too far in calerdar years. A simple way is to substitute the missing areaswith data from the second image. However, even in case the image dates are close to each other, and the atmospheric correction hasbeen carried out, there may be dissimilarities. Therefore, a stand alone program gdal_gapfill has beendeveloped, which computes local regression models and fills the missing areas using the model. Forlarge holes, a large area regression model is applied. The user can define the parameters of modellingwindows for local and large area models. In areas with severe changes (e.g. clear-cut) the filling willnot succeed. Also, the model is sensitive to other outliers, so that the cloud/shadow mask must bemore or less perfect for the method to work well.

Classification

There is a multitude of methods, of which we present a selection:

1. Unsupervised classification

Unsupervised classification algorithms include e.g.: K-means clustering, ISODATA clustering, and

Page 4: Example Process Chain - OpenForisToolkit Wiki

Narenda-Goldberg clustering. ( http://www.ccrs.nrcan.gc.ca/glossary/index_e.php).

The problem with unsupervised classification is that the spectral classes do not necessarily matchany informational classes due, e.g. to mixels and illumination conditions. At least, a large number ofpreliminary classes is to be required - these can later be combined into the actual aspired classes.

In this package, k-means algorithm is included (stand-alone program gdal_kmeans). It clusters theinput image into user defined number of clusters, based on a sample of pixels (the sampling densityis determined by the user). The script cluster.bash can be used to run the program.

2. Supervised classification

2.1. Maximum likelihood

Maximum likelihood method computes statistics from training site pixels, and the probability densityfunction shows the probability for the target pixel to belong a certain class. The class with thehighest probability will be given to the pixel. For this, no tool is provided within this package, yet.

2.2. K nearest neighbors (k-nn)

This method performs a non-parametric regression. For each target pixel, we want to find the k mostsimilar observations, from which we have measured the variables we are interested in. One way todetermine the similarity is to calculate simple Euclidean distance between the DN's of the target andthe observations with measured variables. The method must be parameterized for the data andproblem at hand, e.g. the number of nearest neihgbors must be chosen. Increasing the value of kgenerally improves the accuracy of the estimates, but the amount of training data becomes alimiting factor.

The stand alone program gdal_nn can be used.

2.3. Random Forest

For this, no tool is provided within this package, yet.

Evaluation

Motivation: -Accuracy affects the legal standing and operational usefulness of maps and reports derived fromremotely sensed data (Campbell 2002) -No image product should be released without descriptive error statistics.

Typically, error matrices are computed, from which producer's and user's accuracies, overallaccuracy, kappa and adjusted kappa values can be derived. Possibilities for obtaining groundreference information for error matrices include use of independent test data (field observations,image interpretation, auxiliary data) and use of leave-one-out cross-validation. Spatial variation isnot covered by these measures, however. Maps of uncertainty can also be produced, revealing e.g.the distance to the class centre or the variation between neares neighbors.

After evaluation, additional training data may need to be acquired for classes that were detectedpoorly and the classification re-run.

Editing of output

Motivation: -Remove noise from the classified image -Correct obvious errors -Separate classes for which classification failed -Create subclasses using auxiliary data or user-determined filters -Match classifications on overlapping areas of 2 scenes

Page 5: Example Process Chain - OpenForisToolkit Wiki

Except where otherwise noted, this content is licensed under a

C reative C ommons A ttribution License. See C opyrights .

This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organizationthat helps communities collaborate via wikis.