improving capacity in forest resources assessment in kenya (ic … · nyeri site was delineated...
TRANSCRIPT
Improving Capacity in Forest Resources Assessment
in Kenya (IC-FRA)
Technical Report on
LiDAR Assisted Estimation of Forest Resources
in Kenya
May 2016
1
© DRSRS, 2016
All rights reserved. No part of this guide may be reproduced or transmitted in any form or by any means,
electronic, electrostatic, magnetic or mechanical, including photocopying or recording on any information
storage and retrieval system, without prior permission in writing from either Director Natural Resources
Institute Finland (Luke), Kenya Forest Service (KFS), Kenya Forestry Research Institute (KEFRI),
Department of Resource Surveys and Remote Sensing (DRSRS); and Vice Chancellor University of Eldoret
(UoE) except for short extracts in fair dealing, critical scholarly review or discourse with acknowledgement.
Prepared by:
The Project Improving Capacity in Forest Resources Assessment in Kenya (IC-FRA) implemented 2013–2015
Compiled by:
András Balázs, Helena Haakana, Pekka Hyvönen, Heikki Parikka, Fredrick Ojuang, Faith Mutwiri, Peter
Nduati
Cover caption:
A true-colour 3D point cloud near Londiani, Kenya
2
Table of contents
Table of contents ..................................................................................................................................... 2
List of figures .......................................................................................................................................... 3
List of tables ............................................................................................................................................ 3
1 Introduction ....................................................................................................................................... 5
2 Test sites ............................................................................................................................................ 5
3 Data acquisition ................................................................................................................................. 6
4 Ground truthing ................................................................................................................................. 8
4.1 Preparation for field work ..................................................................................................... 8
4.1.1 Allocation of reference sample plots ..................................................................................... 8
4.1.2 Training of field teams ........................................................................................................ 10
4.1.3 Execution of ground truthing ............................................................................................... 11
5 Data analysis ................................................................................................................................... 12
5.1 Forest attributes ................................................................................................................... 13
5.2 Processing GNSS data ......................................................................................................... 16
5.3 Pre-processing LiDAR data and aerial imagery .................................................................. 17
5.4 Feature selection and biomass estimation ........................................................................... 18
5.5 Feature extraction ................................................................................................................ 18
5.6 Feature selection .................................................................................................................. 20
5.7 Biomass estimation .............................................................................................................. 22
6 Results ............................................................................................................................................. 22
7 Discussion ....................................................................................................................................... 25
8 Future use of the data ...................................................................................................................... 29
References ............................................................................................................................................. 31
3
List of figures
Figure 1. Delineation of the Kericho test area near Eldama Ravine.....................................................................5
Figure 2. Delineation of the Aberdare test area near Nyeri. .................................................................................6
Figure 3. False colour aerial imagery (near-infrared, red and green channels) and laser data coverage (red
line) of the Kericho test area, approximately 425 km2. .......................................................................7
Figure 4. False colour aerial imagery (near-infrared, red and green channels) and laser data (red line)
coverage of the Aberdare test area, approximately 175 km2. ..............................................................7
Figure 5. Plot of within-cluster sums of squares vs. number of clusters, Kericho dataset. ..................................9
Figure 6. Plot of within-cluster sums of squares vs. number of clusters, Aberdare dataset. ................................9
Figure 7. A sample plot in the Aberdare test sitea on steep slope with Mount Kenya in the background. ........12
Figure 8. Scatter plot of measured and estimated AGB on sample plots in Kericho. ........................................23
Figure 9. Scatterplot of measured and estimated AGB on sample plots in Aberdare. .......................................24
Figure 10. Scatterplot of measured and estimated AGB on sample plots on both test areas. .............................24
Figure 11. False color infrared image and biomass map in Kericho. .................................................................25
Figure 12. Biomass estimates with false color infrared imagery. .......................................................................25
Figure 13. Biomass estimates with DCM. ..........................................................................................................26
Figure 14. Uncertain plot position under dense canopy cover in Kericho. ........................................................26
Figure 15. Successful GNSS measurement in Kericho (grid size 1 mm). ..........................................................27
Figure 16. Successful GNSS measurement in Aberdare (grid size 5 mm). ........................................................27
Figure 17. Grid cells with very light vegetation ................................................................................................28
Figure 18. Sample plot in the Kerichotest site covered by shrubs. .....................................................................29
List of tables
Table 1. Symbols used in the equations. ............................................................................................................13
Table 2. Results of biophysical survey in Kericho. ............................................................................................16
Table 3. Results of biophysical survey in Aberdare. ..........................................................................................16
Table 4. Statistics of GNSS data collected in Kericho. ......................................................................................17
Table 5. Statistics of GNSS data collected in Aberdare. ....................................................................................17
Table 6. Table of 3D features. ............................................................................................................................18
Table 7. Table of raster features. ........................................................................................................................20
Table 8. Statistics of estimates usingall and selected features in Kericho. .........................................................21
Table 9. Statistics of estimates using all and selected features in Aberdare. ......................................................21
Table 10. Statistics of estimates using all and selected features in Aberdare and Kericho. ...............................22
Table 11. Statistics of measured and estimated total biomass on sample plots in Kericho. ...............................22
Table 12. Statistics of measured and estimated total biomass on sample plots in Aberdare. .............................22
Table 13. Statistics of measured and estimated total biomass on sample plots on both test areas. ....................23
4
Acknowledgements
This Technical report on LiDAR Assisted Estimation of Forest Resources in Kenya is a product of effort by a
large number of people and institutions. The compilers of this report would like to extend their gratitude to
institutions that provided staff and allowed them to support this noble initiative.
We most sincerely thank the Government of Finland and Government of Kenya for providing financial
support for the IC-FRA Project that has built capacity in terms of staff, equipment and knowledge in LiDAR
Assisted Estimation of Forest Resources. Special thanks are extended to Directors of Luke, DRSRS, KEFRI,
KFS; and the Vice Chancellor University of Eldoret for allowing their staff to participate in the Project.
This report will go a long way to inspire similar studies and promote application LiDAR Assisted Estimation
of Forest Resources in Kenya.
5
1 Introduction
Remote sensing has continued to play an increasingly vital role in natural resources assessment. Airborne
Laser Scanning (ALS) is one of the most dynamically developing methods used in local forest inventories
providing biomass and growing stock estimates. Laser scanning, also known as LiDAR (Light Detection and
Ranging) provides three dimensional point data (point cloud) whose characteristics are highly correlated with
forest attributes like total growing stock or above ground biomass. Based on the correlation between forest
attributes and laser/image features, attributes measured on sample plots can be estimated fully covering an
area of interest. The estimation not only provides figures but also spatial data as attributes are estimated for
each grid cell laid over the target area.
Airborne laser scanning is an active remote sensing technology. Airborne LiDAR systems consist of laser
sensor, computing and data storage unit, GPS unit and Inertial Measurement Unit (IMU). The laser scanner
emits a laser pulse and measures the time elapsed between emission and return(s) of the pulse. Based on the
speed of light, the elapsed time recorded and the exact position (GPS, IMU) of the sensor, the position of the
object which reflected the pulse can be calculated. Nowadays laser scanners are able to record up to 5-6
returns/pulse. Aircrafts used for laser scanning purposes are usually fixed wing planes. Depending on the
scanner and flying altitude different point densities can be reached in the final product. Higher flying altitude
results in lower point density and lower costs and lower flying altitude increases both costs and point density.
Point density for forest inventory purposes range from 0.5 to 2-3 points·m-2
.
In the “Improving Capacity in Forest Resources Assessment in Kenya” (IC-FRA) project, ALS assisted
estimation of above ground biomass was undertaken over two test areas in Kenya. The aim of the exercise was
to introduce a modern forest inventory technique and explore the possibility to use the acquired data in
estimation of forest stand characteristics, firstly for forest management purposes and secondly for
inventorying areas with poor accessibility.
The main phases of the exercise were:
- ALS data acquisition,
- Ground truthing and
- Data analysis.
2 Test sites
Two test sites were selected for the exercise. The first test site was located northwest from Nakuru near
Eldama Ravine in Baringo, Nakuru and Kericho Counties (Figure 1) and the other in Aberdares forest west of
Nyeri (Figure 2). The Kericho site was covered typically by plantation forests, and the Aberdare test site with
natural forests and bamboo forests. The total areas of the Kericho and Aberdare test sites were approximately
425 km2 and 175 km
2, respectively.
Figure 1. Delineation of the Kericho test area near Eldama Ravine.
6
Figure 2. Delineation of the Aberdare test area near Nyeri.
The Kericho site was selected in order to test ALS in estimation of forest characteristics for local forest
management planning. In addition to state owned plantation forests, the site was delineated to cover the
training forest around Kenya Forestry College (KFC) in Londiani. Nyeri site was delineated inside the
Aberdare national park to test the feasibility of laser data for estimating forest characteristics in remote areas
with natural vegetation.
3 Data acquisition
The Airborne Laser Scanning (ALS) and aerial image acquisition service was tendered in December
2013through a Finnish channel for national and international public procurements
(www.hankintailmoitukset.fi). The service was awarded to TerraTec AS from Norway. The data acquisition
was carried out on 10th-13
thof February, 2014. DRSRS provided the flight service with Cessna 208 (Caravan)
aircraft.
Flying altitude was 1550 m above ground, laser data density 2 pts/m2 and ground resolution of the aerial
imagery 25 cm. The aerial imagery was delivered as false colour infrared (near-infrared, red and green
channels) and true colour (red, green and blue channels).Full technical details about data acquisition is
provided in TerraTec’s final report attached to this document (Appendix 1).
The costs of data acquisition including aerial imagery and excluding ground truthing were 2.15 € per hectare
(aircraft: 26.000 €, data acquisition and processing: 103.000 €, area covered: 60000 ha).
7
Figure 3. False colour aerial imagery (near-infrared, red and green channels) and laser data coverage (red
line) of the Kericho test area, approximately 425 km2.
Figure 4. False colour aerial imagery (near-infrared, red and green channels) and laser data (red line)
coverage of the Aberdare test area, approximately 175 km2.
8
4 Ground truthing
In order to produce reliable estimates of various forest attributes including total biomass, field data, i.e.ground
truthing is required. Ground truthing was carried out by measuring reference sample plots in the test areas. It
was important that the location of the reference plots could be measured at the accuracy level of 2-3
decimetres. In order to meet this requirement, GNSS (Global Navigation Satellite System) devices were
procured. The field work included two types of measurements: collection of GNSS data and collection of
biophysical data of the standing vegetation.
The optimal allocation of reference plots was guided by the already acquired laser and aerial image data to
optimize the sampling design bearing in mind both results’ accuracy and financial limitations.
4.1 Preparation for field work
The preparation phase for ground truthing included allocation of sample plots to be measured, compilation of
a field manual and separate manuals for GNSS data collection and post-processing, and training of field
teams. The field manual1provides detailed information on how the field measurements were carried out.
4.1.1 Allocation of reference sample plots
The laser data and aerial imagery were utilized in the sampling design for the ground truthing. TerraTec
provided the pre-processed data in April, 2014 and the plan for the field measurements was prepared by
September, 2014. The main objective was to select a representative sample of the forests within the test sites
for the interpretation of the laser data. The test sites were first classified into homogenous areas according to
the laser and image data properties, and sample plots were then allocated to these classes proportionally to the
class area. The budget for the field work was a limiting factor and determined the total amount of sample plots
that could be measured in the field. The sampling design is described in detail in the following.
A grid with a cell size of 26.5 x 26.5 m was generated over both test areas. The cell size corresponded to the
area of a reference sample plot which radius was chosen to be 15 m. For each grid cell four features were
calculated from the laser and aerial image data. The selected features correlated strongly with some important
characteristics of land-cover and vegetation types, thus making the classification of grid cells possible. The
selected features were:
h85f: laser feature, 85 percentile of first pulse canopy heights (85% of the first pulse canopy height
observations fall below this height), correlates strongly with vegetation height;
vegf: laser feature, proportion of vegetation returns (returns from at least 2 m above ground; 2 m is
widely used as limit for vegetation pulses) from all returns (first returns), correlates strongly with
vegetation cover;
meannir: raster feature, average of pixel values of near-infrared channel, correlates with different
vegetation types and tree species groups;
idmnear: Inverse Difference Moment, raster texture feature of near-infrared channel, also called
Homogeneity, correlates with forest-non-forest land-cover types (forest: low homogeneity) and
canopy structure
Based on these features the grid cells were clustered (classified) into homogenous classes using the kmeans
tool of the Stats package in R statistical software. The tool k-means “aims to partition the points into k groups
such that the sum of squares from points to the assigned cluster centres is minimized” (kmeans help page in
R).
K-means clustering requires the number of clusters (classes) to be defined in advance. The number of clusters
used in this exercise was calculated by running k-means with a set of number of clusters to be considered. For
each test run the total sum of squares within clusters (the sum of the distances of observations from the cluster
1 Field Manual for LiDAR Assisted Estimation of Forest Resources in Kenya, 2016
9
centres) was plotted. It is expected that the sum of distances will decrease rapidly as the number of clusters
increase but the curve will flatten after a while. The bend in the curve provides the optimal amount of clusters.
Based on the test runs and the plots (Figures 4 and 5), the number of clusters, i.e. homogenous classes in
respect of the selected features, chosen for k-means was 130 and 100 in the Kericho and Aberdare dataset,
respectively.
Figure 5. Plot of within-cluster sums of squares vs. number of clusters, Kericho dataset.
Figure 6. Plot of within-cluster sums of squares vs. number of clusters, Aberdare dataset.
10
The total numberof sample plots allocated to the two test areas was 524. It was calculated by assuming that in
2.5 months (50 working days) four field teams, which were the budgetary constraints for the field work, can
measure 3 and 2 plots per day in Kericho and Aberdare, respectively. The sample plots were divided between
the test areas proportional to their sizes: 371 plots inKericho and 153 plotsin Aberdare test site.
As our interest lied in woody vegetation, grid cells with very few vegetation returns (vegf<10%) were
excluded from the process. At the same time classes with very low h85f values (all or nearly all h85f values
equal zero in the same class) were also discarded It is important to state though, the estimation of forest
attributes using k-nearest neighbor method requires sample plots with “zero-volume”, the samples were
generated indoors.
The sample plots were allocated to the classes proportional to the area covered by a particular class. In order
to decrease time used for moving from one plot to the next, sample plots were grouped together. The size of
groups was defined according to the assumed amount of plots to be measured in one field day. A group
consisted of one preselected “main” plot and one or two “accompanying” plots. The groups were formed
using the following criteria:
main plots have to be at a distance of at least 300/500 m from any other plot (Kericho/Aberdare);
two plots from the same class are not allowed in the same group;
accompanying plots have to be at a distance of 100-200 m from the main plot;
accompanying plots have to be at a distance of at least 100 m from any other plot;
Main plots were allocated into the classes with the highest areal coverage. Each class was further classified
into subclasses. The number of subclasses was equal to the number of plots to be allocated into the class.
Accompanying plots were selected around the main plots from the classes with less coverage.
4.1.2 Training of field teams
Trainings of field teams were conducted during two separate expert missions in three sessions. The first
training session on GNSS measurements and post-processing of GNSS2 data took place on 19
th, 22
nd and 23
rd
September, 2014. During the training participants learned how to configure and carry out measurements using
Javad Triumph-2 GNSS units and Android smart phones. In addition the usage of RTKLIB in post-correcting
the acquired data was discussed in detail.
The second training session on field measurements was conducted near Nakuru from 30th September to 3
rd
of October 2014 in various vegetation types including cypress plantations and bamboo forest. The four days
field training included GNSS data collection with Javad units3, measurements of vegetation on sample plots
and data recording using the most recent version of OpenForis Collect Mobile (OFCM), and downloading and
importing data to Collect Server. The Open Foris is a free and open-source software package developed at the
Food and Agricultural Organization (FAO) for forest inventory data collection and management. Parameters
to be measured, recorded and measurement methods were discussed throughout the training and adjustments
to both the survey form and field manual were made.
The third training session was from 13th-15
th January 2015) and was found necessary because ground
truthing was delayed due to approaching rainy season and Christmas holidays. The session was conducted at
Makutano Forest Station near Eldama Ravine. The aim of the three days training was to refresh the knowledge
on GNSS data collection and tree measurements. The training included navigation to sample plots, data
collection with Javad units, measurements of the growing stock on sample plots and data recording using the
most recent version of OFCM. One team was trained on the use of the FZ-G1 Toughpad, i.e. rugged field
tablet for data recording.
2 RTKLIB manual on post-correction of GNSS data collected with Javad Triumph-2
3Manual on GNSS data collection with Javad Triumph-2
11
4.1.3 Execution of ground truthing
Four field teams comprising staff members of KFS and DRSRS carried out the ground truthing between
January and April, 2015. Two teams continued the field measurements in the Aberdare test site in September,
2015 to complete the number of sample plots required. A team consisted of the following members:
2 Inventory foresters
1 Local forester (ranger if measurements were carried out in a national park)
1 GNSS expert
2 Rangers
1 Driver
1 Casual
One of the inventory foresters was the crew leader and the other the assistant crew leader. The crew leader
was responsible for the day to day quantity and quality of the work done by the crew.
A base station for the GNSS measurements was set up for each test site. The base station was manned by one
GNSS expert from DRSRS. The expert in charge of the base station was responsible for the equipment and the
continuous data collection at the base during the field measurements each day.
The field work started immediately after the third training session on 16th of January 2015 in the Kericho test
area. Measurements in the Kericho area were completed on the 13th of April. A total of 360 plots were
measured. The main challenge in that area was the poor quality of GNSS measurements. Quality checks of the
collected GNSS data were done continuously and teams advised to use auxiliary points if necessary in order to
improve accuracy of positioning. Due to delay in the availability of international correction data whivh took
1-2 days after data collection, the teams re-visited the plots in question to re-collect GNSS data. It was
therefore important to mark the plot centers adequately.
As the ground truthing was carried out more than one year after the laser data acquisition, changes were
expected in vegetation, especially in forest cover. Altogether 51 plots had to be relocated due to a disturbance
such as clear-felling or removal of trees since February 2014. Instructions for adjustment of plot location were
given in the field manual4
In Kericho test site, there were 11 inaccessible sample plots which were not measured. because they were
located on steep slopes of Mount Londiani with very poor accessibility.
Field measurements on the Aberdare site started on 13th of April and were interrupted on 18
th of April due to
the rainy season which made the accessibility of sample plots impossible. Two field teams returned to
Aberdare to measure additional plots from 2nd
to 11th of September 2015. The total number of plots
measured was 100. The main challenge in this area was the terrain which was not taken into account in the
allocation of sample plots. Based on the experiences of the field teams this should have been considered and
areas with very steep slopes excluded from reference data collection. Accurate elevation models and slope
rasters could have been created based on the laser data.
4Field Manual for LiDAR Assisted Estimation of Forest Resources in Kenya, 2016
12
Figure 7. A sample plot in the Aberdare test sitea on steep slope with Mount Kenya in the background.
On two sample plots (ID 2027 and 2047) the slant instead of horizontal distances were recorded. Distances for
border trees were checked in the field therefore this has no impact on the usability of the data recorded on
these plots. However these plots cannot be used due to uncertainty and further research should been to
detrmine the effect of such plots on size of sample plots. various sample plot sizes.
In the pilot inventory of the IC-FRA project in 2013, PDAs (Personal Digital Assistant) were used for data
collection. Since then the Open Foris data collection package had been developed and Open Foris Collect
Mobile could be conveniently used on any kind of Android device including smart phones. Therefore in the
ground truthing smart phones procured for the pilot inventory were used for data recording. However, smart
phones were not efficient enough for collection of bulky data. On numerous sample plots more than 100 trees
were recorded which caused the phones to slow down. As a solution to the problem, 4 field tablets were
procured. The field teams found them to be very efficient in data collection because the above mentioned
performance problems did not occur even when handling large amount of data. The tablets worked for an
entire working day with one battery charge, though backup batteries were carried throughout the survey.
5 Data analysis
Data analysis was carried out in Finland by the remote sensing expert using open source and free software
(QGIS, R, rtklib). Colleagues of Kenyan partner organizations were then trained in the use of tools and
methods. Two training sessions were organized, the first in Nakuru between 23rd
and the 27th of October 2015
and the second in Nairobi from 15th to 19
th February 2016 with participants from all partner organizations of
the project (KFS, KEFRI, DRSRS and UoE).
Data analysis includes the following steps:
Calculation of forest attributes using tree measurements
Calculation of sample plot locations using GNSS data
Processing laser data and aerial imagery
Calculating estimates of forest attributes
13
5.1 Forest attributes
Tree data collected on the sample plots was checked and cleansed prior to further processing. Built-in error
checks and warnings of Open Foris Collect forms minimized the amount of errors in the data. Error checks
made the data collection more resource intensive. However use of field tablets solved this issue. Calculations
of forest attributes for the sample plots were carried out with R statistical software. The calculations are
described in the following paragraphs and the symbols used in the equation are listed in Table 1.
Table 1. Symbols used in the equations.
Symbol Description
d Diameter at the breast height, 1.3 m
h Height
n Count, number
π Pi
WD Wood density
V Volume
AGB Above ground biomass
GW Green weight
DW Dead wood
Tree heights
Calculations started by estimating tree heights for all tally trees (measured trees) applying lmfor R-package. A
height model was generated with the help of sample trees having measured height and diameter at breast
height. The height model can be calibrated with random effects at desired levels presuming the levels have
sample trees. Due to the selection method5of sample trees there was at least one sample tree on each plot
where trees were measured. Hence, the height model was calibrated at the plot level, i.e. heights of tally trees
within each plot were computed with the same model parameters and these parameters were allowed to vary
between different plots. Dead or broken trees were not included in the modelling phase.
Volume and biomass of trees
In the next step, volume (m3/tree) and biomass (kg/tree) for trees were estimated. There were only few
species-specific functions available and thus, volume and biomass for most of the trees were estimated by
common functions. Although there were species-specific equations for the most common tree species, the
equations needed variables that were not measured in the field. The species-specific volume equations used
were (Wanene, 1975):
VEucalyptus = e(-4.3687+1.8139*log10(d)+1.1111*log10(h))
VCupressus =0.013152-0.00005069 * d2+0.0001769 * d * h+0.00002895 * d
2* h
VRadiata = e(-4.2244+1.9056*log10(d)+0.9277*log10(h))
VPinus = -0.00041-0.00005711 *d2 +0.0001352 * (d*h)+0.00003313 * (d
2*h)
The common volume equation used were (Chidumayo 2012):
Vcommon = 0.67 * π * (0.01 * d / 2)2 * h.
The biomass for trees (kg) was estimated with the following equations (Henry et al., 2013):
AGBAcacia = 3.7704* bd +1.1682
5Field Manual: LiDAR Assisted Estimation of Forest Resources in Kenya, 2016
14
AGBRhizophora = 0.8069 * d2.5154
,
AGBcommon = e(0.93 * log((d^2 * h))-2.97)
.
The biomass equation of Acacia included variable “bd” meaning the diameter at the base of the tree. That was
not measured and therefore, the diameter at breast height was used instead. Thus, biomass of Acacia was most
probably underestimated.
Volumes and biomasses of trees were transformed to values per hectare (m3/ha and t/ha) with tree expansion
factors. As trees were measured within the sample plot with 15 m radius, each tree had the same expansion
factor (10000 / (π * 152) = 14.147. The same expansion factor was used for shrubs, bamboos and climbers,
which were also measured within the sample plot with a radius of 15 m.
Volume and biomass of bamboo
Measurements for unevenly distributed bamboos were taken on the entire sample plot, while evenly
distributed bamboos were measured on two subplots with radius 2.82 m6. The volume of bamboos was
calculated with the equation:
Vbamboo = d2 – (d * 0.7)
2 / 4 * π * h * 0.8
(National Forest and Tree Resources Assessment 2005–2007)
The biomass of bamboos was calculated with the equations developed by Muchiri and Muga (2013). These
functions give biomass of merchantable bamboo. The green weight was calculated with the following
equations:
GWbamboo = -1.11 + 0.36 * d2 bamboo diameter > 3 cm
GWbamboo = -1.11 + 0.36 * 3.12 bamboo diameter < 3 cm
In the formula a constant“3.1” was used for small bamboos instead of diameter as the volume function gave
negative values for bamboo shoots less than 3 cm in diameter.
And then the dry weight (biomass) of bamboo was calculated with the formula:
AGBbamboo = 1.04 + 0.06 * d * GWbamboo
Volume and biomass of climbers
Climbers with a diameter equal or larger than 2 cm were measured in the field. For volume calculation the
cylinder equation was used:
Vclimber = π * (d/2)2 * h
The above ground biomass was calculated as follows:
AGBclimber= Vclimber * WD
WD was set to 0.6 (Henry et al., 2010).
Volume and biomass of small tress
Information on small trees was measured and recorded on four circular 2.82 m radius sub-plots. Amount (n) of
small trees with a height of at least 1.3 m and a diameter (d)of less than 50 mm was recorded along with the
average diameter (d) and height (h) in each subplot separately.
6Field Manual: LiDAR Assisted Estimation of Forest Resources in Kenya, 2016
15
Volume of small trees in a subplot was calculated using the volume equation of cone:
Vsmalltrees = (π * (d/2)2 * h)/3 * n
For biomass calculations an average wood density of 0.59 tons/m3 was used as tree species were not recorded
on the field(Henry et al., 2010).
AGBsmalltrees= Vsmalltrees* WD
After subplot-level calculations, results were expanded to the entire plot using expansion factor (152* π)/
(2.822* π) = 28.293. Then the same expansion factor as for treeswas applied.
Biomass of shrubs
The lack of suitable volume and biomass equations for shrubs was realized only in the data analysis phase.
The data recorded in the field was average height and canopy coverage in 4 classes, relative to the entire area
of the plot (<10%, 10%-39%, 40%-69%, >=70%), was insufficient for reliable estimates of biomass.. Most
equations for shrubs had explanatory variables which were not measured. However it was important to
account for biomass of shrubs as well.
Consequently, the following biomass model for shrubs (Mwakalukwa, 2014) was applied though estimation of
additional variables was required:
ln(AGBshrubs) = -1.8567+2.2145 * ln(d) + 0.1591 * ln(h)
Characteristics of shrubs in the study(Mwakalukwa, 2014) were:
height range: 2.5-8 m
DBH range: 1.8-21.5 cm
The mean DBH for individual shrubs on a sample plot was approximated using the DBH distribution of
indigenous small sampletrees measured on all sample plots in the test area. Only those trees were taken into
account, whose height exceeded by maximally 2.5 m the highest average shrub height. The range of average
shrub height recorded in the field was 1-7.5 m. Linear regression was fitted to predict the DBH of a tree using
the tree height as independent variable. The regression equation was used to calculate the mean DBH of
shrubs on the plot based on the average height measured in the field. The biomass model was then applied to
estimate the biomass of an average shrub on the plot.
Next, the number of shrubs on the plot was estimated with the help of the canopy coverage recorded in the
field. Absolute canopy coverage of shrubs on the plot was obtained from the mean relative canopy coverage in
each coverage class multiplied by the plot area:
1 - <10%: 0.05* (152 * π) = 35.34 m
2
2 - 10%-39%: 0.245 * (152 * π) = 173.18 m
2
3 - 40%-69%: 0.545 * (152 * π) = 385.24 m
2
4 - >=70%: 0.85 * (152 * π) =600.83 m
2
The percentage coverage of shrubs on the plot were estimated based on an average canopy cover of 10 m2
per
shrub, which was chosen after visual interpretation and measurements on the aerial imagery. After that, the
total shrub biomass on the plot was calculated and converted to biomass per hectare.
The results of the biophysical survey without added zero plot data are presented in the tables below. All
biomass values are above ground biomass (AGB). Total biomass is the sum of trees’, bamboo’s, climbers’,
small trees’ and shrubs’ biomasses. Estimations were carried out on total biomass.
16
Table 2. Results of biophysical survey in Kericho.
Biomass (Mg/ha) Minimum Maximum Mean Standard
deviation
Trees 0 715.778 141.565 158.081
Bamboo 0 43.400 0.652 4.405
Climbers 0 86.559 1.682 6.575
Small trees 0 22.493 0.584 1.899
Shrubs 0 22.878 2.567 3.886
Total biomass 0.183 715.778 147.051 157.370
Table 3. Results of biophysical survey in Aberdare.
Biomass (Mg/ha) Minimum Maximum Mean Standard
deviation
Trees 0 996.460 97.710 150.968
Bamboo 0 220.017 23.771 32.998
Climbers 0 14.602 0.385 1.753
Small trees 0 6.187 0.256 0.898
Shrubs 0 19.390 2.260 4.240
Total biomass 5.864 1003.250 124.382 146.614
5.2 Processing GNSS data
GNSS data was collected on all sample plots and a base station set separately for each test area. The aim of the
GNSS data collection was to locate the sample plot centers with an accuracy of 2-3 decimeters. This enables
the spatial matching of remotely sensed and ground truth data. In GNSS data processing rtklib version 2.427an
open source program package for GNSS positioning was used.
The first step was to calculate the position of the base station by utilizing the correction data collected and
organized by the International GNSS Service (IGS). Correction data (ephemerides, Earth rotation parameters
and satellite clock) as well as international base stationdata was downloaded for each day of data collection.
On the Kericho test site data recorded by MOIU (station at Moi University) and on the Aberdare test site by
RCMN(station at the Regional Centre for Mapping of Resources for Development in Nairobi) was used to
calculate the base station’s position.
Once the exact location of the base station was defined, it was used to correct the GNSS data collected by the
rovers, i.e. the GNSS units at the sample plots. All calculations were carried out in R by invoking rtklib’s
command line tools. For full details of the process see the manual.
In Kericho roughly one third of the GNSS measurements turned out to be poor partly because of very dense
vegetation, partly for inexplicable reasons. It was important to check the GNSS data once the correction files
became available. The person in charge of the base station would have had enough time to carry out the
calculations, but battery life of the computer in use was limited and no power source was available. In some
cases it was possible to revisit the plot and collect new GNSS data.
In Aberdare the accuracy level of GNSS positioning was considerably better. This might be due to different
vegetation cover: absence of dense plantations and good signal even under dense bamboo coverage.
Statistics on GNSS positioning are presented in the tables below. The figures are calculated based on the
positions collected on the field and post-processed in rtklib. Positions were recorded in 5 seconds interval at
the sample plot centers. After post-processing the magnitude of the positions’ scatter in East-West and North-
7 RTKLIB manual on post-correction of GNSS data collected with Javad Triumph-2
17
South direction was calculated. Higher scatter results in higher uncertainty in positioning. All data collected at
sample plot centers and auxiliary points used in positioning were taken into account.
Table 4. Statistics of GNSS data collected in Kericho.
Min (m) Max (m) Mean (m)
Scatter E-W 0.0 77.3 5.3
Scatter N-S 0.0 67.1 2.5
Table 5. Statistics of GNSS data collected in Aberdare.
Min (m) Max (m) Mean (m)
Scatter E-W 0.1 50.9 4.9
Scatter N-S 0.0 67.5 2.5
5.3 Pre-processing LiDAR data and aerial imagery
Remotely sensed data delivered by TerraTec had to be pre-processed before estimation of forest attributes.
Processing steps of LiDAR data and aerial imagery are described below
LiDAR data was delivered in las file format, 2.5x2.5 km blocks in Kenyan coordinate system Arc 1960/UTM
zone 37s. The laser returns were classified as ground, noise and unclassified; heights were calculated above
the reference ellipsoid.
The first processing step was done using las2txt.exe from the Lastools software package by rapidlasso GmbH
(http://rapidlasso.com/lastools/). Las files were transformed to ASCII files which are compatible with the tools
used in further steps. The output ASCII files contained the following attributes: x, y, z coordinates, intensity,
class of return, amount of returns of given pulse, number of return.
All steps following were executed in Linux environment using awk Linux tool and a software package coded
by Luke’s researcher Juho Pitkänen (tool used for the particular step). Firstly, points classified as noise and
with zero amounts of returns were removed. The latter was caused by a bug in the processing software of the
LiDAR data (awk).Then the height above ground (HAG) for each laser return was calculated using k-nearest
neighbor method combined with inverse distance weighting (jg_txt_height).The number of nearest neighbors
was set to 4and a weighting parameter to 1 (the closer the nearest neighbor, the higher the weight). The results
of that step were used in 3D feature calculations.
In the next step the first returns (number of return=1) were selected and Digital Canopy Models (DCM) were
created (jg_poi_idw). The points with a HAGvalue of 60 meters and above as well as below -1 meters were
excluded and negative HAG values were set to zero. The number of nearest neighbors was set to 3 and a
weighting parameter to 2.The resolution of the resulting rasters was 0.5 m.
Other features calculated were the Haralick texture features (Haralick et al., 1973). Texture features are based
on grey-level co-occurrence matrices. In order to decrease variation in DCMs, pixel values were reclassified
into 16 classes. Due to the reclassification pixels with close values (e.g. 24.5, 24.3 or 0.4, 0.2, etc.) got the
same value and the textural variation between different vegetation or land cover types was detectable. The
same reclassification matrix was used for the entire set of DCMs. The matrix was calculated based on one
DCM image, which had the widest range of pixel values. In order to include all pixel values of all the DCM
images in the reclassification, the upper limit of the last class was set to infinite.
Two sets of aerial images were delivered, true color and color infrared, both with 25 cm resolution. Only color
infrared images were used in the exercise. Images were first resampled to 50 cm which was sufficient for
18
estimating forest attributes at the stand level. The images were reclassified for calculation of Haralick’s
texture features in the same way as the DCMs. All raster image sets were merged using a virtual raster format.
These virtual mosaics were used in all following steps.
5.4 Feature selection and biomass estimation
As it was mentioned in the introduction, features extracted from laser data and aerial imagery can be used to
estimate forest attributes, e.g. biomass. Features are numerical representations of characteristics of remotely
sensed data covering a certain area. The total amount of features extracted for reference sample plots was 130.
However some of the features might not correlate with forest attributes of interest and some features might
correlate between each other. In order to exclude such features feature selection needed to be done. Feature
selection was carried out for the two test sites separately.
In both feature extraction and biomass estimation the k-nearest neighbor (k-NN) method was applied. In the k-
NN method forest attributes are estimated for a certain areausing knearest neighbors of a reference data set
with known attributes. Nearest neighbors are selected based on the similarity of features extracted from the
target area and the features of the reference data set’s members.
5.5 Feature extraction
3D features were extracted from laser point clouds using R. For each sample plot laser returns within a
distance of 15 meters from the plot center were extracted and a set of features calculated based on the above
ground height of each return.
Table 6. Table of 3D features.
Name Description Literature
avgf Mean values H ; canopy points, H >= 2 m,
first pulse
Packalén et al. (2008)
havgl Mean values H ; canopy points, H >= 2 m,
last pulse
Packalén et al. (2008)
hstdf Std of H ; canopy points (H >= 2 m), first
pulse
Packalén et al. (2008)
hstdl Std of H ; canopy points (H >= 2 m), last
pulse
Packalén et al. (2008)
h0f,h5f,...,h90f Percentiles of the first pulse laser canopy
heights for
0%, 5%, 10%, 20%, ... , 90%, 95%, 100%
(m);
Naesset (2004a);
Packalén&Maltamo (2006)
s.615
h0l,h5l,...,h90l Percentiles of the last pulse laser canopy
heights for
0%, 5%, 10%, 20%, ... , 90%, 95%, 100%
(m)
Naesset (2004a);
Packalén&Maltamo (2006)
s.615
hcvf Coefficient of variation of the first pulse laser
canopy heights (%)
Naesset (2004a)
hcvl Coefficient of variation of the last pulse laser
canopy heights (%)
Naesset (2004a)
vegf Proportion of above-ground hits, first pulse Packalén et al. (2008)
vegl Proportion of above-ground hits, last pulse Packalén et al. (2008)
19
Note:
Height range is
divided into
equal intervals:
(see d*
variables
below)
The range between the lowest laser canopy
height (>2 m) and the maximum canopy
height, as derived from the first pulse canopy
height distribution, was divided into 10
fractions of equal length. Canopy densities
were then computed as the proportions of first
pulse laser hits above fraction
no. 0 (2m), 1, . . . ., 9 to total number of first
pulses.
Corresponding densities were also computed
for the last pulse data.
Naesset (2004a)
(*+++) height is divided
into classes of equal length,
(not equal proportions)
d0f,...d9f Canopy densities corresponding to the
proportions of first pulse laser hits above
fraction no. 0, 1,...,9 to total number of first
pulses
Naesset (2004a), p.171
(*+++)
d0l,...,d9l Canopy densities corresponding to the
proportions of last pulse laser hits above
fraction no. 0,1,...,9 to total number of last
pulses
Naesset (2004a), p. 171
(*+++)
p20f,...p95f Canopy height percentiles for 20%, 40%,
60%, 80%, and 95% (h_20,...,h_95) were
computed and the corresponding
proportional canopy densities (p20,...,p95)
were computed for each of these quantiles.
Canopy density p_x is the proportion of ALS
hits accumulated at height h_x.
Packalén&Maltamo (2008, p. 1753),first
pulses.
Packalén&Maltamo (2008)
s.1753
p20l,...p95l Canopy height percentiles for 20%, 40%,
60%, 80%, and 95% (h_20,...,h_95) were
computed and the corresponding
proportional canopy densities (p20,...,p95)
were computed for each of these quantiles.
Canopy density p_x is the proportion of ALS
hits accumulated at height h_x.
Packalén&Maltamo (2008, p. 1753), last
pulses.
Packalén&Maltamo (2008)
s.1753
pcgf Proportions of canopy hits versus ground hits
(0-100%), first pulses
Packalén&Maltamo(2006)
s.615
pcgl Proportions of canopy hits versus ground hits
(0-100%), last pulses
Packalén&Maltamo (2006)
s.615
pghf Proportion of ground hits, first pulses Packalén&Maltamo (2006)
s.615
Raster features were extracted from the green, red and near infrared channels of the aerial imagery, and from
the DCMs using Juho Pitkänen’s toolset.The raster features included11 Haralick texture features which were
calculated using a 26.5 by 26.5 meters processing window around the plot center. In the processing window
pixels 2.5 meters apart were taken into consideration enabling the detection of textural structure of forest
canopy. The Haralick features can be calculated at 0, 45, 90 and 135 degrees. However, these features are
highly correlatedwith each other, and therefore, averages of the four directions are calculated automatically by
the tool.
20
Additionally mean and standard deviation of pixel values falling inside each sample plot were calculated in R.
A pixel was considered to be inside the sample plot when its center was not further than 15 meters from the
plot center.
Table 7. Table of raster features.
Name Description Literature
*_ASM Angular Second Moment
Haralick et al., 1973
*_Contr Contrast
*_Corr Correlation
*_Var Variance
*_IDM Inverse Diff Moment
*_SA Sum Average
*_SV Sum Variance
*_SE Sum Entropy
*_Entr Entropy
*_DV Difference Variance
*_DE Difference Entropy
*_mean Average values of aerial image bands and DCMs Tuominen&Haapanen (2008)
*_sd Standard deviations of aerial image bands and
DCMs
Tuominen&Haapanen (2008)
Asterisks mark prefixes in feature names:
h – DCM
green – aerial image, green band
red – aerial image, red band
nir – aerial image, near infrared band
5.6 Feature selection
Feature selection was necessary in order to use an optimal set of features in the estimation of forest attributes.
Feature selection was done using a genetic-algorithm-based approach implemented in R (genalg package).
First, a certain amount (population size) of feature sets (chromosomes) was randomly generated. The fitness
of chromosomes was evaluated using the leave-one-out cross-validation method. In the cross-validation
biomass was estimated for one sample plot by means of the features (genes) in the feature set and the
measured biomass values of all the other sample plots in the reference data. The estimation was done using the
k-NN method. Selectionof nearest neighbors was based on weighted Euclidean distances in the f-dimensional
feature space where f equals to the number features in the feature set. Larger weight parameter (g) gives less
weight to a nearest neighbor on a larger distance. Cross-validation was done for each chromosome n times
where n equals to the amount of sample plots.
Given the estimates and measured biomass values root mean square error (RMSE) and bias were calculated.
The genetic algorithm was aiming to minimize RMSE and bias calculated for the chromosomes. This was
done by carrying out the aforementioned stepsi times where i is the number of iterations set by the user.
Between iterations chromosomes exchanged genes (mutation) catalyzed by parameters such as mutation
chance (the chance that a gene in the chromosome mutates (changes)) and elitism (the number of
chromosomes kept unchanged into the next generation (iteration)).Optimal number of nearest neighbors and
weight for distance weighing was selected based on a number of test runs. The optimal chromosome i.e.
feature set was the one with minimal RMSE and bias. Finally weights for features in the optimal feature set
were calculated using the same approach. Feature weighing enabled further minimization of RMSE and bias
(see Table 8.)Table 9 and Table 10). Feature selection was carried out first separately for the test sites, then
taken sample plots from both sites into account.
21
In the Kericho test site the optimal amount of nearest neighbors (k) was found to be 4 and the parameter for
inverse distance weighing (g) of 1.5 was used. The total amount of sample plots used was 352. Some plots
were excluded due to uncertain location information and 3 so called zero plots (plots with zero woody
biomass) were added. Feature selection resulted in the following set of features (see Table 6 and Table 7 for
descriptions):
3D features: havgf, h10f, h60l, h70l, vegl, d0f, d3f, d2l, pghf, i40f;
Raster features extracted from DCMs: h_mean, h_sd, h_ASM, h_Contr, h_IDM, h_DV;
Raster features extracted from aerial image bands: nir_sd, nir_Entr, nir_DE.
The total number of selected features was 19, of which 3 features were extracted from aerial image bands.
This was to be expected as three dimensional datasets tend to have a strong correlation with forest attributes
like biomass or total growing stock.
Table 8. Statistics of estimates usingall and selected features in Kericho.
Total AGB (Mg/ha) RMSE Bias Relative RMSE
(%)
Relative bias
(%)
All features in use 69.3 -6.2 47.5 -4.2
Selected features 50.3 0.0 34.5 0.0
Selected features
with weighing 48.0 -0.1 32.9 0.0
In the Aberdare test site the optimal amount of nearest neighbors (k) was found to be 3 and the parameter for
inverse distance weighing (g) of 2.5 was used. The total amount of sample plots used was 101. Some plots
were excluded due to uncertain location information and zero plots were added. Feature selection resulted in
the following set of features:
3D features: h5f, h20f, h30f, h60l, h95l, d5l, p20f, p40f, p80f, pcgf;
Raster features extracted from DCMs: h_sd, h_ASM, h_SA, h_DV;
Raster features extracted from aerial image bands: red_ASM.
The total number of selected features was 15, of which one feature was extracted from aerial image bands.
Table 9. Statistics of estimates using all and selected features in Aberdare.
Total AGB (Mg/ha) RMSE Bias Relative RMSE
(%)
Relative bias
(%)
All features in use 89.2 -7.5 73.9 -6.2
Selected features 72.5 -0.1 60.1 -0.1
Selected features
with weighing 71.1 0.0 58.9 0.0
When processing both test sites at the same time the optimal amount of nearest neighbors (k) was found to be
6 and the parameter for inverse distance weighing (g) of 3 was used. The total amount of sample plots used
was 453. Some plots were excluded due to uncertain location information and zero plots were added. Feature
selection resulted in the following set of features:
3D features: havgf, h30f, h85f, h95f, h70l, h85l, hcvf, vegl, d0f, p20f, p95f, p20l;
Raster features extracted from DCMs: h_mean, h_Var;
Raster features extracted from aerial image bands: nir_Contr, red_DE.
The total number of selected features was 16, of which two features were extracted from aerial image bands.
22
Table 10. Statistics of estimates using all and selected features in Aberdare and Kericho.
Total AGB (Mg/ha) RMSE Bias Relative RMSE
(%)
Relative bias
(%)
All features in use 73.1 -4.7 52.2 -3.3
Selected features 60.6 0.0 43.3 0.0
Selected features
with weighing 57.9 0.0 41.3 0.0
5.7 Biomass estimation
The main objective of the exercise was to produce a biomass map over the test areas. For the biomass
estimation, grids covering each test site were generated. The size of grid cells was defined taking the area of
the sample plots used for ground truthing and practical aspects into consideration. The area of one sample plot
was approximately 707 m2 corresponding to a cell size of 26.59 x 26.59 m
2. As round coordinates are easier to
work with, a grid cell size of 25x25 m was chosen. The comparativeness of sample plots’ and grid cells’
features is not influenced by the negligible difference of the area covered (707 vs 625 m2).
Selected features were extracted for each grid cell and the k-NN method was used to calculate biomass
estimates for each cell (see section0). For the estimation only the selected feature set and the reference data of
the respective test site was used. Parameters k and g were set to the same value as in the feature selection. The
results were saved in vector and raster format. The test sites were processed separately.
6 Results
Table 11and Table 12shows the statistics of measured and estimated total biomass on reference plots in
Kericho and Aberdare respectively. Negative bias means that the method applied underestimated biomass in
the reference data set. It is important to mention that error estimates are only valid for the reference data set
and cannot be used as error estimates for the entire test site. Estimates on higher scale like forest stand level
are more accurate as errors of individual estimation units are balanced out.
Table 11. Statistics of measured and estimated total biomass on sample plots in Kericho.
Total AGB (Mg∙ha
-1) Measured Estimated
Range (min-max) 0-716 3-636
Mean 147 146
SD 157 149
RMSE 48.0
Bias -0.1
Relative RMSE (%) 32.9
Relative bias (%) 0.0
Table 12. Statistics of measured and estimated total biomass on sample plots in Aberdare.
Total AGB (Mg∙ha-1
) Measured Estimated
Range (min-max) 6-1003 0-477
Mean 124 121
SD 147 117
RMSE 71.1
Bias 0.0
Relative RMSE (%) 58.9
Relative bias (%) 0.0
23
Table 13. Statistics of measured and estimated total biomass on sample plots on both test areas.
Total AGB (Mg∙ha-1
) Measured Estimated
Range (min-max) 0-1003 0-477
Mean 140 140
SD 155 141
RMSE 57.9
Bias 0.0
Relative RMSE (%) 41.3
Relative bias (%) 0.0
In Figure 8, Figure 9 and Figure 10, estimated biomass is plotted against biomass calculated from field
measurements in Kericho, Aberdare and taken both test sites into account, respectively.
Figure 8. Scatter plot of measured and estimated AGB on sample plots in Kericho.
24
Figure 9. Scatterplot of measured and estimated AGB on sample plots in Aberdare.
Figure 10. Scatterplot of measured and estimated AGB on sample plots on both test areas.
Rasterized representation of the results is shown in Figure 11. The resolution of the biomass map is 25x25 m,
biomass increases from red to green.
25
Figure 11. False color infrared image and biomass map in Kericho.
7 Discussion
Plot level accuracy of the biomass estimates was unexpected poor. Reasons for that were explored and are
discussed as the following. Visual verification of the biomass maps revealed that the ground truthing was not
completely successful. In Kericho test site most of the forested areas were uniform plantations, however,
biomass estimates for neighboring cells in the same plantation could vary significantly. In Figure 12for the
grid cell around the center 528 Mg/ha biomass was estimated whereas for the neighboring cell to the northeast
only 419 Mg/ha. To verify the similarity of the cells digital canopy model is set as background in Figure 13.
Figure 12. Biomass estimates with false color infrared imagery.
26
Figure 13. Biomass estimates with DCM.
One possible reason for errors of this kind is the uncertainty in the positioning of sample plots. Results
showed that Javad GNSS units could not record the position accurately under dense vegetation cover (see
Figure 14). In the Kericho test site a high number of unthinned, extremely dense coniferous plantations were
found. About one third of the sample plot positions were poor which means that post-processed GNSS records
for the plot center had an average scatter of approximately 7 and 14 meters in East-West and North-South
direction respectively. In some cases all points collected on the same sample plot fell within several square
millimeters (see Figure 15).
Figure 14. Uncertain plot position under dense canopy cover in Kericho.
27
Figure 15. Successful GNSS measurement in Kericho (grid size 1 mm).
It is important to mention, that GNSS data collected in the Aberdare test site was considerably better. This
might be due to the fact that typical vegetation type was bamboo in that area which apparently did not
interfere with GNSS satellite signals (see Figure 16).
Figure 16. Successful GNSS measurement in Aberdare (grid size 5 mm).
28
Figure 17. Grid cells with very light vegetation
Another typical error in the biomass map was that grid cells with nearly no vegetation got inaccurate biomass
estimates, as can be observed in Figure. This is due to the lack of similar sample plots, in this case sample
plots with low biomass values, in the reference data set. Nearest neighbors selected from the reference data
were in fact much too different from the target grid cell which led to overestimation of biomass.
In addition to the issues with positioning and the k-NN method, there were problems concerning the accuracy
of the field estimates. Measurements on shrubs were limited to average height and canopy cover expressed in
4 levels. Review of the literature revealed that field measurement methods and biomass models for African
shrub species are not common. Based on the articles found, and the attributes measured, only a vague estimate
of biomass of shrubs was possible. The accuracy of biomass estimates for shrubs in the ground truth is of great
importance as the biomass of shrubby vegetation can be a significant portion of total above ground biomass in
certain areas. In some sample plots shrubs were the only vegetation cover. On the sample plot shown in Figure
shrub coverage was over 70 % and the average shrub height 2.5 m. There was no other woody vegetation
recorded on the plot.
29
Figure 18. Sample plot in the Kerichotest site covered by shrubs.
There was a significant difference between the RMSE values calculated for the two test sites (33% and 59%).
One reason for the higher RMSE in the Aberdare test site was most probably high variation in vegetation
which was not properly covered in the ground truth. In the Kericho area (approx. 425 km2) a total of 349
sample plots were used (1.22 km2/plot)while in Aberdare (approx. 175 km
2) only 98 plots were usable (1.79
km2/plot).In Kericho the vegetation cover was mostly uniform plantations as opposed to natural vegetation in
Aberdare with more variance, which could not be covered by the small amount of plots measured. Another
reason is plot no. 2141 with extremely high amount of biomass (1003 Mg/ha) caused by one tree on the plot
with DBH over 2 m.
The second highest measured biomass was 525 Mg/ha. This means that using 3 nearest neighbors the estimate
for plot no. 2141 will be heavily underestimated as it is shown in Figure 9 (point in the right of the plot
area).The effect of this outlier was tested by excluding it from the feature selection process. The relative
RMSE decreased from 58.9 % to 41.3 %. Exclusion of outliers from the reference data is however not
advisable. Good practice is to measure a representative set of reference sample plots in the target area. As it
was mentioned in section 0 terrain should be taken into account during sample plot allocation.
In the calculation of 3D features laser pulses were considered to be reflected by vegetation if their above
ground height was not less than 2 meters. This approach is widelyused in boreal forestsbut needs to be
reconsidered in order to match field measurements (minimum height of shrubs, trees to be recorded) to the 3D
features.
8 Future use of the data
The LiDAR data and aerial imagery acquired in the IC-FRA project together with the sample plot data offer
several possibilities for further research. The most interesting subject would be to investigate how RMSE and
bias change by decreasing sample plot radius. The distance of each tallied tree from the plot center was
recorded in the field. This makes it possible to carry out feature selection by using trees, for example, within
12 or 9 meters radius. Smaller plot radius would increase time and cost efficiency of ground truthing and
make it possible to estimate attributes for finer resolution (20x20 m grid).
30
Due to lack of time in the IC-FRA project feature selection and forest attribute estimation was carried out only
for biomass. Other forest attributes correlating with laser and image features, for example, total growing stock
or Lorey’s height could be estimated similarly.
For verification and to get an idea of stand level accuracy, the estimated forest attributes should be compared
with a separate field data collected with, for example, dense systematic sampling. With the help of digital
stand delineation, estimated forest attributes for stands can be calculated as an average of grid cells within
each stand. This way the results would be comparable with already existing data, for example, from the forests
managed by the Kenya Forest College in Londiani, as it was initially planned for. The ALS data acquired
could be utilized also in teaching of forestry students to introduce modern technology in forest data collection.
31
References
http://cs.uef.fi/~lamehtat/documents/lmfor.pdf
Chidumayo, E.N.,2012,Assessment of Existing Models for Biomass and Volume Calculations for Zambia,
Report prepared for FAO-Zambia integrated land use assessment (ILUA) phase II project, 58 p.
Haralick R., Shanmugam K., Dinstein I., 1973, Textural Features for Image Classification, IEEE Trans. on
Systems, Man and Cybernetics, vol. SMC–3, No. 6: 610–621 p.
Henry, M., Besnardd A., Asantee W.A., Eshunf J., Adu-Bredug S., Valentinic R., Bernouxb M., Saint-Andréh
L., 2010, Wood density, phytomass variations within and among trees, and allometric equations in a tropical
rainforest of Africa, Forest Ecology and Management, vol. 260: 1383 p.
Mwakalukwa, E.E., Meilby H., Treue T., 2014, Volume and Aboveground Biomass Models for Dry Miombo
Woodland in Tanzania. International Journal of Forestry Research, vol. 2014, article ID 531256
Næsset, E., 2002, Predicting forest stand characteristics with airborne scanning laser using a practical two-
stage procedure and field data, Remote Sensing of Environment, vol.80: 88-90 p.
Næsset, E., 2002, Predicting forest stand characteristics with airborne scanning laser using a practical two-
stage procedure and field data, Scandinavian Journal of Forest Research, vol19:164-179 p.
Packalén, P., Maltamo, M., 2006, Predicting the plot volume by tree species using airborne laser scanning and
aerial photographs, Forest Science, vol. 52(6), 611–622 p.
Packalén P., Maltamo M., 2008, Estimation of species-specific diameter distributions using airborne laser
scanning and aerial photographs, Canadian Journal of Forest Research, vol.38(7): 1750-1760 p.
Packalén, P., Suvanto, A., Maltamo, M., 2008, A two stage method to estimate species-specific growing stock
by combining ALS data and aerial photographs of known orientation parameters, Manuscript IV in
Dissertationes Forestales 77
Tuominen S., Haapanen R., 2013, Estimation of forest biomass by means of genetic algorithm-based
optimization of airborne laser scanning and digital aerial photograph features, Silva Fennica, vol. 47, no. 1,
article ID 902
Vehmas, M., Peuhkurinen, J., Eerikäinen, K., Packalén, P.,Maltamo, M., 2009, Identification of boreal forest
stands with high herbaceous plant diversity using airborne laser scanning, Forest Ecology and
Management,vol.257: 46-53 p.
Wanene, A.G. 1975, A provisional yield table for Pinus Patula grown in Kenya. FD technical note 143.