improving capacity in forest resources assessment in kenya (ic … · nyeri site was delineated...

32
Improving Capacity in Forest Resources Assessment in Kenya (IC-FRA) Technical Report on LiDAR Assisted Estimation of Forest Resources in Kenya May 2016

Upload: tranngoc

Post on 12-Jul-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Improving Capacity in Forest Resources Assessment

in Kenya (IC-FRA)

Technical Report on

LiDAR Assisted Estimation of Forest Resources

in Kenya

May 2016

1

© DRSRS, 2016

All rights reserved. No part of this guide may be reproduced or transmitted in any form or by any means,

electronic, electrostatic, magnetic or mechanical, including photocopying or recording on any information

storage and retrieval system, without prior permission in writing from either Director Natural Resources

Institute Finland (Luke), Kenya Forest Service (KFS), Kenya Forestry Research Institute (KEFRI),

Department of Resource Surveys and Remote Sensing (DRSRS); and Vice Chancellor University of Eldoret

(UoE) except for short extracts in fair dealing, critical scholarly review or discourse with acknowledgement.

Prepared by:

The Project Improving Capacity in Forest Resources Assessment in Kenya (IC-FRA) implemented 2013–2015

Compiled by:

András Balázs, Helena Haakana, Pekka Hyvönen, Heikki Parikka, Fredrick Ojuang, Faith Mutwiri, Peter

Nduati

Cover caption:

A true-colour 3D point cloud near Londiani, Kenya

2

Table of contents

Table of contents ..................................................................................................................................... 2

List of figures .......................................................................................................................................... 3

List of tables ............................................................................................................................................ 3

1 Introduction ....................................................................................................................................... 5

2 Test sites ............................................................................................................................................ 5

3 Data acquisition ................................................................................................................................. 6

4 Ground truthing ................................................................................................................................. 8

4.1 Preparation for field work ..................................................................................................... 8

4.1.1 Allocation of reference sample plots ..................................................................................... 8

4.1.2 Training of field teams ........................................................................................................ 10

4.1.3 Execution of ground truthing ............................................................................................... 11

5 Data analysis ................................................................................................................................... 12

5.1 Forest attributes ................................................................................................................... 13

5.2 Processing GNSS data ......................................................................................................... 16

5.3 Pre-processing LiDAR data and aerial imagery .................................................................. 17

5.4 Feature selection and biomass estimation ........................................................................... 18

5.5 Feature extraction ................................................................................................................ 18

5.6 Feature selection .................................................................................................................. 20

5.7 Biomass estimation .............................................................................................................. 22

6 Results ............................................................................................................................................. 22

7 Discussion ....................................................................................................................................... 25

8 Future use of the data ...................................................................................................................... 29

References ............................................................................................................................................. 31

3

List of figures

Figure 1. Delineation of the Kericho test area near Eldama Ravine.....................................................................5

Figure 2. Delineation of the Aberdare test area near Nyeri. .................................................................................6

Figure 3. False colour aerial imagery (near-infrared, red and green channels) and laser data coverage (red

line) of the Kericho test area, approximately 425 km2. .......................................................................7

Figure 4. False colour aerial imagery (near-infrared, red and green channels) and laser data (red line)

coverage of the Aberdare test area, approximately 175 km2. ..............................................................7

Figure 5. Plot of within-cluster sums of squares vs. number of clusters, Kericho dataset. ..................................9

Figure 6. Plot of within-cluster sums of squares vs. number of clusters, Aberdare dataset. ................................9

Figure 7. A sample plot in the Aberdare test sitea on steep slope with Mount Kenya in the background. ........12

Figure 8. Scatter plot of measured and estimated AGB on sample plots in Kericho. ........................................23

Figure 9. Scatterplot of measured and estimated AGB on sample plots in Aberdare. .......................................24

Figure 10. Scatterplot of measured and estimated AGB on sample plots on both test areas. .............................24

Figure 11. False color infrared image and biomass map in Kericho. .................................................................25

Figure 12. Biomass estimates with false color infrared imagery. .......................................................................25

Figure 13. Biomass estimates with DCM. ..........................................................................................................26

Figure 14. Uncertain plot position under dense canopy cover in Kericho. ........................................................26

Figure 15. Successful GNSS measurement in Kericho (grid size 1 mm). ..........................................................27

Figure 16. Successful GNSS measurement in Aberdare (grid size 5 mm). ........................................................27

Figure 17. Grid cells with very light vegetation ................................................................................................28

Figure 18. Sample plot in the Kerichotest site covered by shrubs. .....................................................................29

List of tables

Table 1. Symbols used in the equations. ............................................................................................................13

Table 2. Results of biophysical survey in Kericho. ............................................................................................16

Table 3. Results of biophysical survey in Aberdare. ..........................................................................................16

Table 4. Statistics of GNSS data collected in Kericho. ......................................................................................17

Table 5. Statistics of GNSS data collected in Aberdare. ....................................................................................17

Table 6. Table of 3D features. ............................................................................................................................18

Table 7. Table of raster features. ........................................................................................................................20

Table 8. Statistics of estimates usingall and selected features in Kericho. .........................................................21

Table 9. Statistics of estimates using all and selected features in Aberdare. ......................................................21

Table 10. Statistics of estimates using all and selected features in Aberdare and Kericho. ...............................22

Table 11. Statistics of measured and estimated total biomass on sample plots in Kericho. ...............................22

Table 12. Statistics of measured and estimated total biomass on sample plots in Aberdare. .............................22

Table 13. Statistics of measured and estimated total biomass on sample plots on both test areas. ....................23

4

Acknowledgements

This Technical report on LiDAR Assisted Estimation of Forest Resources in Kenya is a product of effort by a

large number of people and institutions. The compilers of this report would like to extend their gratitude to

institutions that provided staff and allowed them to support this noble initiative.

We most sincerely thank the Government of Finland and Government of Kenya for providing financial

support for the IC-FRA Project that has built capacity in terms of staff, equipment and knowledge in LiDAR

Assisted Estimation of Forest Resources. Special thanks are extended to Directors of Luke, DRSRS, KEFRI,

KFS; and the Vice Chancellor University of Eldoret for allowing their staff to participate in the Project.

This report will go a long way to inspire similar studies and promote application LiDAR Assisted Estimation

of Forest Resources in Kenya.

5

1 Introduction

Remote sensing has continued to play an increasingly vital role in natural resources assessment. Airborne

Laser Scanning (ALS) is one of the most dynamically developing methods used in local forest inventories

providing biomass and growing stock estimates. Laser scanning, also known as LiDAR (Light Detection and

Ranging) provides three dimensional point data (point cloud) whose characteristics are highly correlated with

forest attributes like total growing stock or above ground biomass. Based on the correlation between forest

attributes and laser/image features, attributes measured on sample plots can be estimated fully covering an

area of interest. The estimation not only provides figures but also spatial data as attributes are estimated for

each grid cell laid over the target area.

Airborne laser scanning is an active remote sensing technology. Airborne LiDAR systems consist of laser

sensor, computing and data storage unit, GPS unit and Inertial Measurement Unit (IMU). The laser scanner

emits a laser pulse and measures the time elapsed between emission and return(s) of the pulse. Based on the

speed of light, the elapsed time recorded and the exact position (GPS, IMU) of the sensor, the position of the

object which reflected the pulse can be calculated. Nowadays laser scanners are able to record up to 5-6

returns/pulse. Aircrafts used for laser scanning purposes are usually fixed wing planes. Depending on the

scanner and flying altitude different point densities can be reached in the final product. Higher flying altitude

results in lower point density and lower costs and lower flying altitude increases both costs and point density.

Point density for forest inventory purposes range from 0.5 to 2-3 points·m-2

.

In the “Improving Capacity in Forest Resources Assessment in Kenya” (IC-FRA) project, ALS assisted

estimation of above ground biomass was undertaken over two test areas in Kenya. The aim of the exercise was

to introduce a modern forest inventory technique and explore the possibility to use the acquired data in

estimation of forest stand characteristics, firstly for forest management purposes and secondly for

inventorying areas with poor accessibility.

The main phases of the exercise were:

- ALS data acquisition,

- Ground truthing and

- Data analysis.

2 Test sites

Two test sites were selected for the exercise. The first test site was located northwest from Nakuru near

Eldama Ravine in Baringo, Nakuru and Kericho Counties (Figure 1) and the other in Aberdares forest west of

Nyeri (Figure 2). The Kericho site was covered typically by plantation forests, and the Aberdare test site with

natural forests and bamboo forests. The total areas of the Kericho and Aberdare test sites were approximately

425 km2 and 175 km

2, respectively.

Figure 1. Delineation of the Kericho test area near Eldama Ravine.

6

Figure 2. Delineation of the Aberdare test area near Nyeri.

The Kericho site was selected in order to test ALS in estimation of forest characteristics for local forest

management planning. In addition to state owned plantation forests, the site was delineated to cover the

training forest around Kenya Forestry College (KFC) in Londiani. Nyeri site was delineated inside the

Aberdare national park to test the feasibility of laser data for estimating forest characteristics in remote areas

with natural vegetation.

3 Data acquisition

The Airborne Laser Scanning (ALS) and aerial image acquisition service was tendered in December

2013through a Finnish channel for national and international public procurements

(www.hankintailmoitukset.fi). The service was awarded to TerraTec AS from Norway. The data acquisition

was carried out on 10th-13

thof February, 2014. DRSRS provided the flight service with Cessna 208 (Caravan)

aircraft.

Flying altitude was 1550 m above ground, laser data density 2 pts/m2 and ground resolution of the aerial

imagery 25 cm. The aerial imagery was delivered as false colour infrared (near-infrared, red and green

channels) and true colour (red, green and blue channels).Full technical details about data acquisition is

provided in TerraTec’s final report attached to this document (Appendix 1).

The costs of data acquisition including aerial imagery and excluding ground truthing were 2.15 € per hectare

(aircraft: 26.000 €, data acquisition and processing: 103.000 €, area covered: 60000 ha).

7

Figure 3. False colour aerial imagery (near-infrared, red and green channels) and laser data coverage (red

line) of the Kericho test area, approximately 425 km2.

Figure 4. False colour aerial imagery (near-infrared, red and green channels) and laser data (red line)

coverage of the Aberdare test area, approximately 175 km2.

8

4 Ground truthing

In order to produce reliable estimates of various forest attributes including total biomass, field data, i.e.ground

truthing is required. Ground truthing was carried out by measuring reference sample plots in the test areas. It

was important that the location of the reference plots could be measured at the accuracy level of 2-3

decimetres. In order to meet this requirement, GNSS (Global Navigation Satellite System) devices were

procured. The field work included two types of measurements: collection of GNSS data and collection of

biophysical data of the standing vegetation.

The optimal allocation of reference plots was guided by the already acquired laser and aerial image data to

optimize the sampling design bearing in mind both results’ accuracy and financial limitations.

4.1 Preparation for field work

The preparation phase for ground truthing included allocation of sample plots to be measured, compilation of

a field manual and separate manuals for GNSS data collection and post-processing, and training of field

teams. The field manual1provides detailed information on how the field measurements were carried out.

4.1.1 Allocation of reference sample plots

The laser data and aerial imagery were utilized in the sampling design for the ground truthing. TerraTec

provided the pre-processed data in April, 2014 and the plan for the field measurements was prepared by

September, 2014. The main objective was to select a representative sample of the forests within the test sites

for the interpretation of the laser data. The test sites were first classified into homogenous areas according to

the laser and image data properties, and sample plots were then allocated to these classes proportionally to the

class area. The budget for the field work was a limiting factor and determined the total amount of sample plots

that could be measured in the field. The sampling design is described in detail in the following.

A grid with a cell size of 26.5 x 26.5 m was generated over both test areas. The cell size corresponded to the

area of a reference sample plot which radius was chosen to be 15 m. For each grid cell four features were

calculated from the laser and aerial image data. The selected features correlated strongly with some important

characteristics of land-cover and vegetation types, thus making the classification of grid cells possible. The

selected features were:

h85f: laser feature, 85 percentile of first pulse canopy heights (85% of the first pulse canopy height

observations fall below this height), correlates strongly with vegetation height;

vegf: laser feature, proportion of vegetation returns (returns from at least 2 m above ground; 2 m is

widely used as limit for vegetation pulses) from all returns (first returns), correlates strongly with

vegetation cover;

meannir: raster feature, average of pixel values of near-infrared channel, correlates with different

vegetation types and tree species groups;

idmnear: Inverse Difference Moment, raster texture feature of near-infrared channel, also called

Homogeneity, correlates with forest-non-forest land-cover types (forest: low homogeneity) and

canopy structure

Based on these features the grid cells were clustered (classified) into homogenous classes using the kmeans

tool of the Stats package in R statistical software. The tool k-means “aims to partition the points into k groups

such that the sum of squares from points to the assigned cluster centres is minimized” (kmeans help page in

R).

K-means clustering requires the number of clusters (classes) to be defined in advance. The number of clusters

used in this exercise was calculated by running k-means with a set of number of clusters to be considered. For

each test run the total sum of squares within clusters (the sum of the distances of observations from the cluster

1 Field Manual for LiDAR Assisted Estimation of Forest Resources in Kenya, 2016

9

centres) was plotted. It is expected that the sum of distances will decrease rapidly as the number of clusters

increase but the curve will flatten after a while. The bend in the curve provides the optimal amount of clusters.

Based on the test runs and the plots (Figures 4 and 5), the number of clusters, i.e. homogenous classes in

respect of the selected features, chosen for k-means was 130 and 100 in the Kericho and Aberdare dataset,

respectively.

Figure 5. Plot of within-cluster sums of squares vs. number of clusters, Kericho dataset.

Figure 6. Plot of within-cluster sums of squares vs. number of clusters, Aberdare dataset.

10

The total numberof sample plots allocated to the two test areas was 524. It was calculated by assuming that in

2.5 months (50 working days) four field teams, which were the budgetary constraints for the field work, can

measure 3 and 2 plots per day in Kericho and Aberdare, respectively. The sample plots were divided between

the test areas proportional to their sizes: 371 plots inKericho and 153 plotsin Aberdare test site.

As our interest lied in woody vegetation, grid cells with very few vegetation returns (vegf<10%) were

excluded from the process. At the same time classes with very low h85f values (all or nearly all h85f values

equal zero in the same class) were also discarded It is important to state though, the estimation of forest

attributes using k-nearest neighbor method requires sample plots with “zero-volume”, the samples were

generated indoors.

The sample plots were allocated to the classes proportional to the area covered by a particular class. In order

to decrease time used for moving from one plot to the next, sample plots were grouped together. The size of

groups was defined according to the assumed amount of plots to be measured in one field day. A group

consisted of one preselected “main” plot and one or two “accompanying” plots. The groups were formed

using the following criteria:

main plots have to be at a distance of at least 300/500 m from any other plot (Kericho/Aberdare);

two plots from the same class are not allowed in the same group;

accompanying plots have to be at a distance of 100-200 m from the main plot;

accompanying plots have to be at a distance of at least 100 m from any other plot;

Main plots were allocated into the classes with the highest areal coverage. Each class was further classified

into subclasses. The number of subclasses was equal to the number of plots to be allocated into the class.

Accompanying plots were selected around the main plots from the classes with less coverage.

4.1.2 Training of field teams

Trainings of field teams were conducted during two separate expert missions in three sessions. The first

training session on GNSS measurements and post-processing of GNSS2 data took place on 19

th, 22

nd and 23

rd

September, 2014. During the training participants learned how to configure and carry out measurements using

Javad Triumph-2 GNSS units and Android smart phones. In addition the usage of RTKLIB in post-correcting

the acquired data was discussed in detail.

The second training session on field measurements was conducted near Nakuru from 30th September to 3

rd

of October 2014 in various vegetation types including cypress plantations and bamboo forest. The four days

field training included GNSS data collection with Javad units3, measurements of vegetation on sample plots

and data recording using the most recent version of OpenForis Collect Mobile (OFCM), and downloading and

importing data to Collect Server. The Open Foris is a free and open-source software package developed at the

Food and Agricultural Organization (FAO) for forest inventory data collection and management. Parameters

to be measured, recorded and measurement methods were discussed throughout the training and adjustments

to both the survey form and field manual were made.

The third training session was from 13th-15

th January 2015) and was found necessary because ground

truthing was delayed due to approaching rainy season and Christmas holidays. The session was conducted at

Makutano Forest Station near Eldama Ravine. The aim of the three days training was to refresh the knowledge

on GNSS data collection and tree measurements. The training included navigation to sample plots, data

collection with Javad units, measurements of the growing stock on sample plots and data recording using the

most recent version of OFCM. One team was trained on the use of the FZ-G1 Toughpad, i.e. rugged field

tablet for data recording.

2 RTKLIB manual on post-correction of GNSS data collected with Javad Triumph-2

3Manual on GNSS data collection with Javad Triumph-2

11

4.1.3 Execution of ground truthing

Four field teams comprising staff members of KFS and DRSRS carried out the ground truthing between

January and April, 2015. Two teams continued the field measurements in the Aberdare test site in September,

2015 to complete the number of sample plots required. A team consisted of the following members:

2 Inventory foresters

1 Local forester (ranger if measurements were carried out in a national park)

1 GNSS expert

2 Rangers

1 Driver

1 Casual

One of the inventory foresters was the crew leader and the other the assistant crew leader. The crew leader

was responsible for the day to day quantity and quality of the work done by the crew.

A base station for the GNSS measurements was set up for each test site. The base station was manned by one

GNSS expert from DRSRS. The expert in charge of the base station was responsible for the equipment and the

continuous data collection at the base during the field measurements each day.

The field work started immediately after the third training session on 16th of January 2015 in the Kericho test

area. Measurements in the Kericho area were completed on the 13th of April. A total of 360 plots were

measured. The main challenge in that area was the poor quality of GNSS measurements. Quality checks of the

collected GNSS data were done continuously and teams advised to use auxiliary points if necessary in order to

improve accuracy of positioning. Due to delay in the availability of international correction data whivh took

1-2 days after data collection, the teams re-visited the plots in question to re-collect GNSS data. It was

therefore important to mark the plot centers adequately.

As the ground truthing was carried out more than one year after the laser data acquisition, changes were

expected in vegetation, especially in forest cover. Altogether 51 plots had to be relocated due to a disturbance

such as clear-felling or removal of trees since February 2014. Instructions for adjustment of plot location were

given in the field manual4

In Kericho test site, there were 11 inaccessible sample plots which were not measured. because they were

located on steep slopes of Mount Londiani with very poor accessibility.

Field measurements on the Aberdare site started on 13th of April and were interrupted on 18

th of April due to

the rainy season which made the accessibility of sample plots impossible. Two field teams returned to

Aberdare to measure additional plots from 2nd

to 11th of September 2015. The total number of plots

measured was 100. The main challenge in this area was the terrain which was not taken into account in the

allocation of sample plots. Based on the experiences of the field teams this should have been considered and

areas with very steep slopes excluded from reference data collection. Accurate elevation models and slope

rasters could have been created based on the laser data.

4Field Manual for LiDAR Assisted Estimation of Forest Resources in Kenya, 2016

12

Figure 7. A sample plot in the Aberdare test sitea on steep slope with Mount Kenya in the background.

On two sample plots (ID 2027 and 2047) the slant instead of horizontal distances were recorded. Distances for

border trees were checked in the field therefore this has no impact on the usability of the data recorded on

these plots. However these plots cannot be used due to uncertainty and further research should been to

detrmine the effect of such plots on size of sample plots. various sample plot sizes.

In the pilot inventory of the IC-FRA project in 2013, PDAs (Personal Digital Assistant) were used for data

collection. Since then the Open Foris data collection package had been developed and Open Foris Collect

Mobile could be conveniently used on any kind of Android device including smart phones. Therefore in the

ground truthing smart phones procured for the pilot inventory were used for data recording. However, smart

phones were not efficient enough for collection of bulky data. On numerous sample plots more than 100 trees

were recorded which caused the phones to slow down. As a solution to the problem, 4 field tablets were

procured. The field teams found them to be very efficient in data collection because the above mentioned

performance problems did not occur even when handling large amount of data. The tablets worked for an

entire working day with one battery charge, though backup batteries were carried throughout the survey.

5 Data analysis

Data analysis was carried out in Finland by the remote sensing expert using open source and free software

(QGIS, R, rtklib). Colleagues of Kenyan partner organizations were then trained in the use of tools and

methods. Two training sessions were organized, the first in Nakuru between 23rd

and the 27th of October 2015

and the second in Nairobi from 15th to 19

th February 2016 with participants from all partner organizations of

the project (KFS, KEFRI, DRSRS and UoE).

Data analysis includes the following steps:

Calculation of forest attributes using tree measurements

Calculation of sample plot locations using GNSS data

Processing laser data and aerial imagery

Calculating estimates of forest attributes

13

5.1 Forest attributes

Tree data collected on the sample plots was checked and cleansed prior to further processing. Built-in error

checks and warnings of Open Foris Collect forms minimized the amount of errors in the data. Error checks

made the data collection more resource intensive. However use of field tablets solved this issue. Calculations

of forest attributes for the sample plots were carried out with R statistical software. The calculations are

described in the following paragraphs and the symbols used in the equation are listed in Table 1.

Table 1. Symbols used in the equations.

Symbol Description

d Diameter at the breast height, 1.3 m

h Height

n Count, number

π Pi

WD Wood density

V Volume

AGB Above ground biomass

GW Green weight

DW Dead wood

Tree heights

Calculations started by estimating tree heights for all tally trees (measured trees) applying lmfor R-package. A

height model was generated with the help of sample trees having measured height and diameter at breast

height. The height model can be calibrated with random effects at desired levels presuming the levels have

sample trees. Due to the selection method5of sample trees there was at least one sample tree on each plot

where trees were measured. Hence, the height model was calibrated at the plot level, i.e. heights of tally trees

within each plot were computed with the same model parameters and these parameters were allowed to vary

between different plots. Dead or broken trees were not included in the modelling phase.

Volume and biomass of trees

In the next step, volume (m3/tree) and biomass (kg/tree) for trees were estimated. There were only few

species-specific functions available and thus, volume and biomass for most of the trees were estimated by

common functions. Although there were species-specific equations for the most common tree species, the

equations needed variables that were not measured in the field. The species-specific volume equations used

were (Wanene, 1975):

VEucalyptus = e(-4.3687+1.8139*log10(d)+1.1111*log10(h))

VCupressus =0.013152-0.00005069 * d2+0.0001769 * d * h+0.00002895 * d

2* h

VRadiata = e(-4.2244+1.9056*log10(d)+0.9277*log10(h))

VPinus = -0.00041-0.00005711 *d2 +0.0001352 * (d*h)+0.00003313 * (d

2*h)

The common volume equation used were (Chidumayo 2012):

Vcommon = 0.67 * π * (0.01 * d / 2)2 * h.

The biomass for trees (kg) was estimated with the following equations (Henry et al., 2013):

AGBAcacia = 3.7704* bd +1.1682

5Field Manual: LiDAR Assisted Estimation of Forest Resources in Kenya, 2016

14

AGBRhizophora = 0.8069 * d2.5154

,

AGBcommon = e(0.93 * log((d^2 * h))-2.97)

.

The biomass equation of Acacia included variable “bd” meaning the diameter at the base of the tree. That was

not measured and therefore, the diameter at breast height was used instead. Thus, biomass of Acacia was most

probably underestimated.

Volumes and biomasses of trees were transformed to values per hectare (m3/ha and t/ha) with tree expansion

factors. As trees were measured within the sample plot with 15 m radius, each tree had the same expansion

factor (10000 / (π * 152) = 14.147. The same expansion factor was used for shrubs, bamboos and climbers,

which were also measured within the sample plot with a radius of 15 m.

Volume and biomass of bamboo

Measurements for unevenly distributed bamboos were taken on the entire sample plot, while evenly

distributed bamboos were measured on two subplots with radius 2.82 m6. The volume of bamboos was

calculated with the equation:

Vbamboo = d2 – (d * 0.7)

2 / 4 * π * h * 0.8

(National Forest and Tree Resources Assessment 2005–2007)

The biomass of bamboos was calculated with the equations developed by Muchiri and Muga (2013). These

functions give biomass of merchantable bamboo. The green weight was calculated with the following

equations:

GWbamboo = -1.11 + 0.36 * d2 bamboo diameter > 3 cm

GWbamboo = -1.11 + 0.36 * 3.12 bamboo diameter < 3 cm

In the formula a constant“3.1” was used for small bamboos instead of diameter as the volume function gave

negative values for bamboo shoots less than 3 cm in diameter.

And then the dry weight (biomass) of bamboo was calculated with the formula:

AGBbamboo = 1.04 + 0.06 * d * GWbamboo

Volume and biomass of climbers

Climbers with a diameter equal or larger than 2 cm were measured in the field. For volume calculation the

cylinder equation was used:

Vclimber = π * (d/2)2 * h

The above ground biomass was calculated as follows:

AGBclimber= Vclimber * WD

WD was set to 0.6 (Henry et al., 2010).

Volume and biomass of small tress

Information on small trees was measured and recorded on four circular 2.82 m radius sub-plots. Amount (n) of

small trees with a height of at least 1.3 m and a diameter (d)of less than 50 mm was recorded along with the

average diameter (d) and height (h) in each subplot separately.

6Field Manual: LiDAR Assisted Estimation of Forest Resources in Kenya, 2016

15

Volume of small trees in a subplot was calculated using the volume equation of cone:

Vsmalltrees = (π * (d/2)2 * h)/3 * n

For biomass calculations an average wood density of 0.59 tons/m3 was used as tree species were not recorded

on the field(Henry et al., 2010).

AGBsmalltrees= Vsmalltrees* WD

After subplot-level calculations, results were expanded to the entire plot using expansion factor (152* π)/

(2.822* π) = 28.293. Then the same expansion factor as for treeswas applied.

Biomass of shrubs

The lack of suitable volume and biomass equations for shrubs was realized only in the data analysis phase.

The data recorded in the field was average height and canopy coverage in 4 classes, relative to the entire area

of the plot (<10%, 10%-39%, 40%-69%, >=70%), was insufficient for reliable estimates of biomass.. Most

equations for shrubs had explanatory variables which were not measured. However it was important to

account for biomass of shrubs as well.

Consequently, the following biomass model for shrubs (Mwakalukwa, 2014) was applied though estimation of

additional variables was required:

ln(AGBshrubs) = -1.8567+2.2145 * ln(d) + 0.1591 * ln(h)

Characteristics of shrubs in the study(Mwakalukwa, 2014) were:

height range: 2.5-8 m

DBH range: 1.8-21.5 cm

The mean DBH for individual shrubs on a sample plot was approximated using the DBH distribution of

indigenous small sampletrees measured on all sample plots in the test area. Only those trees were taken into

account, whose height exceeded by maximally 2.5 m the highest average shrub height. The range of average

shrub height recorded in the field was 1-7.5 m. Linear regression was fitted to predict the DBH of a tree using

the tree height as independent variable. The regression equation was used to calculate the mean DBH of

shrubs on the plot based on the average height measured in the field. The biomass model was then applied to

estimate the biomass of an average shrub on the plot.

Next, the number of shrubs on the plot was estimated with the help of the canopy coverage recorded in the

field. Absolute canopy coverage of shrubs on the plot was obtained from the mean relative canopy coverage in

each coverage class multiplied by the plot area:

1 - <10%: 0.05* (152 * π) = 35.34 m

2

2 - 10%-39%: 0.245 * (152 * π) = 173.18 m

2

3 - 40%-69%: 0.545 * (152 * π) = 385.24 m

2

4 - >=70%: 0.85 * (152 * π) =600.83 m

2

The percentage coverage of shrubs on the plot were estimated based on an average canopy cover of 10 m2

per

shrub, which was chosen after visual interpretation and measurements on the aerial imagery. After that, the

total shrub biomass on the plot was calculated and converted to biomass per hectare.

The results of the biophysical survey without added zero plot data are presented in the tables below. All

biomass values are above ground biomass (AGB). Total biomass is the sum of trees’, bamboo’s, climbers’,

small trees’ and shrubs’ biomasses. Estimations were carried out on total biomass.

16

Table 2. Results of biophysical survey in Kericho.

Biomass (Mg/ha) Minimum Maximum Mean Standard

deviation

Trees 0 715.778 141.565 158.081

Bamboo 0 43.400 0.652 4.405

Climbers 0 86.559 1.682 6.575

Small trees 0 22.493 0.584 1.899

Shrubs 0 22.878 2.567 3.886

Total biomass 0.183 715.778 147.051 157.370

Table 3. Results of biophysical survey in Aberdare.

Biomass (Mg/ha) Minimum Maximum Mean Standard

deviation

Trees 0 996.460 97.710 150.968

Bamboo 0 220.017 23.771 32.998

Climbers 0 14.602 0.385 1.753

Small trees 0 6.187 0.256 0.898

Shrubs 0 19.390 2.260 4.240

Total biomass 5.864 1003.250 124.382 146.614

5.2 Processing GNSS data

GNSS data was collected on all sample plots and a base station set separately for each test area. The aim of the

GNSS data collection was to locate the sample plot centers with an accuracy of 2-3 decimeters. This enables

the spatial matching of remotely sensed and ground truth data. In GNSS data processing rtklib version 2.427an

open source program package for GNSS positioning was used.

The first step was to calculate the position of the base station by utilizing the correction data collected and

organized by the International GNSS Service (IGS). Correction data (ephemerides, Earth rotation parameters

and satellite clock) as well as international base stationdata was downloaded for each day of data collection.

On the Kericho test site data recorded by MOIU (station at Moi University) and on the Aberdare test site by

RCMN(station at the Regional Centre for Mapping of Resources for Development in Nairobi) was used to

calculate the base station’s position.

Once the exact location of the base station was defined, it was used to correct the GNSS data collected by the

rovers, i.e. the GNSS units at the sample plots. All calculations were carried out in R by invoking rtklib’s

command line tools. For full details of the process see the manual.

In Kericho roughly one third of the GNSS measurements turned out to be poor partly because of very dense

vegetation, partly for inexplicable reasons. It was important to check the GNSS data once the correction files

became available. The person in charge of the base station would have had enough time to carry out the

calculations, but battery life of the computer in use was limited and no power source was available. In some

cases it was possible to revisit the plot and collect new GNSS data.

In Aberdare the accuracy level of GNSS positioning was considerably better. This might be due to different

vegetation cover: absence of dense plantations and good signal even under dense bamboo coverage.

Statistics on GNSS positioning are presented in the tables below. The figures are calculated based on the

positions collected on the field and post-processed in rtklib. Positions were recorded in 5 seconds interval at

the sample plot centers. After post-processing the magnitude of the positions’ scatter in East-West and North-

7 RTKLIB manual on post-correction of GNSS data collected with Javad Triumph-2

17

South direction was calculated. Higher scatter results in higher uncertainty in positioning. All data collected at

sample plot centers and auxiliary points used in positioning were taken into account.

Table 4. Statistics of GNSS data collected in Kericho.

Min (m) Max (m) Mean (m)

Scatter E-W 0.0 77.3 5.3

Scatter N-S 0.0 67.1 2.5

Table 5. Statistics of GNSS data collected in Aberdare.

Min (m) Max (m) Mean (m)

Scatter E-W 0.1 50.9 4.9

Scatter N-S 0.0 67.5 2.5

5.3 Pre-processing LiDAR data and aerial imagery

Remotely sensed data delivered by TerraTec had to be pre-processed before estimation of forest attributes.

Processing steps of LiDAR data and aerial imagery are described below

LiDAR data was delivered in las file format, 2.5x2.5 km blocks in Kenyan coordinate system Arc 1960/UTM

zone 37s. The laser returns were classified as ground, noise and unclassified; heights were calculated above

the reference ellipsoid.

The first processing step was done using las2txt.exe from the Lastools software package by rapidlasso GmbH

(http://rapidlasso.com/lastools/). Las files were transformed to ASCII files which are compatible with the tools

used in further steps. The output ASCII files contained the following attributes: x, y, z coordinates, intensity,

class of return, amount of returns of given pulse, number of return.

All steps following were executed in Linux environment using awk Linux tool and a software package coded

by Luke’s researcher Juho Pitkänen (tool used for the particular step). Firstly, points classified as noise and

with zero amounts of returns were removed. The latter was caused by a bug in the processing software of the

LiDAR data (awk).Then the height above ground (HAG) for each laser return was calculated using k-nearest

neighbor method combined with inverse distance weighting (jg_txt_height).The number of nearest neighbors

was set to 4and a weighting parameter to 1 (the closer the nearest neighbor, the higher the weight). The results

of that step were used in 3D feature calculations.

In the next step the first returns (number of return=1) were selected and Digital Canopy Models (DCM) were

created (jg_poi_idw). The points with a HAGvalue of 60 meters and above as well as below -1 meters were

excluded and negative HAG values were set to zero. The number of nearest neighbors was set to 3 and a

weighting parameter to 2.The resolution of the resulting rasters was 0.5 m.

Other features calculated were the Haralick texture features (Haralick et al., 1973). Texture features are based

on grey-level co-occurrence matrices. In order to decrease variation in DCMs, pixel values were reclassified

into 16 classes. Due to the reclassification pixels with close values (e.g. 24.5, 24.3 or 0.4, 0.2, etc.) got the

same value and the textural variation between different vegetation or land cover types was detectable. The

same reclassification matrix was used for the entire set of DCMs. The matrix was calculated based on one

DCM image, which had the widest range of pixel values. In order to include all pixel values of all the DCM

images in the reclassification, the upper limit of the last class was set to infinite.

Two sets of aerial images were delivered, true color and color infrared, both with 25 cm resolution. Only color

infrared images were used in the exercise. Images were first resampled to 50 cm which was sufficient for

18

estimating forest attributes at the stand level. The images were reclassified for calculation of Haralick’s

texture features in the same way as the DCMs. All raster image sets were merged using a virtual raster format.

These virtual mosaics were used in all following steps.

5.4 Feature selection and biomass estimation

As it was mentioned in the introduction, features extracted from laser data and aerial imagery can be used to

estimate forest attributes, e.g. biomass. Features are numerical representations of characteristics of remotely

sensed data covering a certain area. The total amount of features extracted for reference sample plots was 130.

However some of the features might not correlate with forest attributes of interest and some features might

correlate between each other. In order to exclude such features feature selection needed to be done. Feature

selection was carried out for the two test sites separately.

In both feature extraction and biomass estimation the k-nearest neighbor (k-NN) method was applied. In the k-

NN method forest attributes are estimated for a certain areausing knearest neighbors of a reference data set

with known attributes. Nearest neighbors are selected based on the similarity of features extracted from the

target area and the features of the reference data set’s members.

5.5 Feature extraction

3D features were extracted from laser point clouds using R. For each sample plot laser returns within a

distance of 15 meters from the plot center were extracted and a set of features calculated based on the above

ground height of each return.

Table 6. Table of 3D features.

Name Description Literature

avgf Mean values H ; canopy points, H >= 2 m,

first pulse

Packalén et al. (2008)

havgl Mean values H ; canopy points, H >= 2 m,

last pulse

Packalén et al. (2008)

hstdf Std of H ; canopy points (H >= 2 m), first

pulse

Packalén et al. (2008)

hstdl Std of H ; canopy points (H >= 2 m), last

pulse

Packalén et al. (2008)

h0f,h5f,...,h90f Percentiles of the first pulse laser canopy

heights for

0%, 5%, 10%, 20%, ... , 90%, 95%, 100%

(m);

Naesset (2004a);

Packalén&Maltamo (2006)

s.615

h0l,h5l,...,h90l Percentiles of the last pulse laser canopy

heights for

0%, 5%, 10%, 20%, ... , 90%, 95%, 100%

(m)

Naesset (2004a);

Packalén&Maltamo (2006)

s.615

hcvf Coefficient of variation of the first pulse laser

canopy heights (%)

Naesset (2004a)

hcvl Coefficient of variation of the last pulse laser

canopy heights (%)

Naesset (2004a)

vegf Proportion of above-ground hits, first pulse Packalén et al. (2008)

vegl Proportion of above-ground hits, last pulse Packalén et al. (2008)

19

Note:

Height range is

divided into

equal intervals:

(see d*

variables

below)

The range between the lowest laser canopy

height (>2 m) and the maximum canopy

height, as derived from the first pulse canopy

height distribution, was divided into 10

fractions of equal length. Canopy densities

were then computed as the proportions of first

pulse laser hits above fraction

no. 0 (2m), 1, . . . ., 9 to total number of first

pulses.

Corresponding densities were also computed

for the last pulse data.

Naesset (2004a)

(*+++) height is divided

into classes of equal length,

(not equal proportions)

d0f,...d9f Canopy densities corresponding to the

proportions of first pulse laser hits above

fraction no. 0, 1,...,9 to total number of first

pulses

Naesset (2004a), p.171

(*+++)

d0l,...,d9l Canopy densities corresponding to the

proportions of last pulse laser hits above

fraction no. 0,1,...,9 to total number of last

pulses

Naesset (2004a), p. 171

(*+++)

p20f,...p95f Canopy height percentiles for 20%, 40%,

60%, 80%, and 95% (h_20,...,h_95) were

computed and the corresponding

proportional canopy densities (p20,...,p95)

were computed for each of these quantiles.

Canopy density p_x is the proportion of ALS

hits accumulated at height h_x.

Packalén&Maltamo (2008, p. 1753),first

pulses.

Packalén&Maltamo (2008)

s.1753

p20l,...p95l Canopy height percentiles for 20%, 40%,

60%, 80%, and 95% (h_20,...,h_95) were

computed and the corresponding

proportional canopy densities (p20,...,p95)

were computed for each of these quantiles.

Canopy density p_x is the proportion of ALS

hits accumulated at height h_x.

Packalén&Maltamo (2008, p. 1753), last

pulses.

Packalén&Maltamo (2008)

s.1753

pcgf Proportions of canopy hits versus ground hits

(0-100%), first pulses

Packalén&Maltamo(2006)

s.615

pcgl Proportions of canopy hits versus ground hits

(0-100%), last pulses

Packalén&Maltamo (2006)

s.615

pghf Proportion of ground hits, first pulses Packalén&Maltamo (2006)

s.615

Raster features were extracted from the green, red and near infrared channels of the aerial imagery, and from

the DCMs using Juho Pitkänen’s toolset.The raster features included11 Haralick texture features which were

calculated using a 26.5 by 26.5 meters processing window around the plot center. In the processing window

pixels 2.5 meters apart were taken into consideration enabling the detection of textural structure of forest

canopy. The Haralick features can be calculated at 0, 45, 90 and 135 degrees. However, these features are

highly correlatedwith each other, and therefore, averages of the four directions are calculated automatically by

the tool.

20

Additionally mean and standard deviation of pixel values falling inside each sample plot were calculated in R.

A pixel was considered to be inside the sample plot when its center was not further than 15 meters from the

plot center.

Table 7. Table of raster features.

Name Description Literature

*_ASM Angular Second Moment

Haralick et al., 1973

*_Contr Contrast

*_Corr Correlation

*_Var Variance

*_IDM Inverse Diff Moment

*_SA Sum Average

*_SV Sum Variance

*_SE Sum Entropy

*_Entr Entropy

*_DV Difference Variance

*_DE Difference Entropy

*_mean Average values of aerial image bands and DCMs Tuominen&Haapanen (2008)

*_sd Standard deviations of aerial image bands and

DCMs

Tuominen&Haapanen (2008)

Asterisks mark prefixes in feature names:

h – DCM

green – aerial image, green band

red – aerial image, red band

nir – aerial image, near infrared band

5.6 Feature selection

Feature selection was necessary in order to use an optimal set of features in the estimation of forest attributes.

Feature selection was done using a genetic-algorithm-based approach implemented in R (genalg package).

First, a certain amount (population size) of feature sets (chromosomes) was randomly generated. The fitness

of chromosomes was evaluated using the leave-one-out cross-validation method. In the cross-validation

biomass was estimated for one sample plot by means of the features (genes) in the feature set and the

measured biomass values of all the other sample plots in the reference data. The estimation was done using the

k-NN method. Selectionof nearest neighbors was based on weighted Euclidean distances in the f-dimensional

feature space where f equals to the number features in the feature set. Larger weight parameter (g) gives less

weight to a nearest neighbor on a larger distance. Cross-validation was done for each chromosome n times

where n equals to the amount of sample plots.

Given the estimates and measured biomass values root mean square error (RMSE) and bias were calculated.

The genetic algorithm was aiming to minimize RMSE and bias calculated for the chromosomes. This was

done by carrying out the aforementioned stepsi times where i is the number of iterations set by the user.

Between iterations chromosomes exchanged genes (mutation) catalyzed by parameters such as mutation

chance (the chance that a gene in the chromosome mutates (changes)) and elitism (the number of

chromosomes kept unchanged into the next generation (iteration)).Optimal number of nearest neighbors and

weight for distance weighing was selected based on a number of test runs. The optimal chromosome i.e.

feature set was the one with minimal RMSE and bias. Finally weights for features in the optimal feature set

were calculated using the same approach. Feature weighing enabled further minimization of RMSE and bias

(see Table 8.)Table 9 and Table 10). Feature selection was carried out first separately for the test sites, then

taken sample plots from both sites into account.

21

In the Kericho test site the optimal amount of nearest neighbors (k) was found to be 4 and the parameter for

inverse distance weighing (g) of 1.5 was used. The total amount of sample plots used was 352. Some plots

were excluded due to uncertain location information and 3 so called zero plots (plots with zero woody

biomass) were added. Feature selection resulted in the following set of features (see Table 6 and Table 7 for

descriptions):

3D features: havgf, h10f, h60l, h70l, vegl, d0f, d3f, d2l, pghf, i40f;

Raster features extracted from DCMs: h_mean, h_sd, h_ASM, h_Contr, h_IDM, h_DV;

Raster features extracted from aerial image bands: nir_sd, nir_Entr, nir_DE.

The total number of selected features was 19, of which 3 features were extracted from aerial image bands.

This was to be expected as three dimensional datasets tend to have a strong correlation with forest attributes

like biomass or total growing stock.

Table 8. Statistics of estimates usingall and selected features in Kericho.

Total AGB (Mg/ha) RMSE Bias Relative RMSE

(%)

Relative bias

(%)

All features in use 69.3 -6.2 47.5 -4.2

Selected features 50.3 0.0 34.5 0.0

Selected features

with weighing 48.0 -0.1 32.9 0.0

In the Aberdare test site the optimal amount of nearest neighbors (k) was found to be 3 and the parameter for

inverse distance weighing (g) of 2.5 was used. The total amount of sample plots used was 101. Some plots

were excluded due to uncertain location information and zero plots were added. Feature selection resulted in

the following set of features:

3D features: h5f, h20f, h30f, h60l, h95l, d5l, p20f, p40f, p80f, pcgf;

Raster features extracted from DCMs: h_sd, h_ASM, h_SA, h_DV;

Raster features extracted from aerial image bands: red_ASM.

The total number of selected features was 15, of which one feature was extracted from aerial image bands.

Table 9. Statistics of estimates using all and selected features in Aberdare.

Total AGB (Mg/ha) RMSE Bias Relative RMSE

(%)

Relative bias

(%)

All features in use 89.2 -7.5 73.9 -6.2

Selected features 72.5 -0.1 60.1 -0.1

Selected features

with weighing 71.1 0.0 58.9 0.0

When processing both test sites at the same time the optimal amount of nearest neighbors (k) was found to be

6 and the parameter for inverse distance weighing (g) of 3 was used. The total amount of sample plots used

was 453. Some plots were excluded due to uncertain location information and zero plots were added. Feature

selection resulted in the following set of features:

3D features: havgf, h30f, h85f, h95f, h70l, h85l, hcvf, vegl, d0f, p20f, p95f, p20l;

Raster features extracted from DCMs: h_mean, h_Var;

Raster features extracted from aerial image bands: nir_Contr, red_DE.

The total number of selected features was 16, of which two features were extracted from aerial image bands.

22

Table 10. Statistics of estimates using all and selected features in Aberdare and Kericho.

Total AGB (Mg/ha) RMSE Bias Relative RMSE

(%)

Relative bias

(%)

All features in use 73.1 -4.7 52.2 -3.3

Selected features 60.6 0.0 43.3 0.0

Selected features

with weighing 57.9 0.0 41.3 0.0

5.7 Biomass estimation

The main objective of the exercise was to produce a biomass map over the test areas. For the biomass

estimation, grids covering each test site were generated. The size of grid cells was defined taking the area of

the sample plots used for ground truthing and practical aspects into consideration. The area of one sample plot

was approximately 707 m2 corresponding to a cell size of 26.59 x 26.59 m

2. As round coordinates are easier to

work with, a grid cell size of 25x25 m was chosen. The comparativeness of sample plots’ and grid cells’

features is not influenced by the negligible difference of the area covered (707 vs 625 m2).

Selected features were extracted for each grid cell and the k-NN method was used to calculate biomass

estimates for each cell (see section0). For the estimation only the selected feature set and the reference data of

the respective test site was used. Parameters k and g were set to the same value as in the feature selection. The

results were saved in vector and raster format. The test sites were processed separately.

6 Results

Table 11and Table 12shows the statistics of measured and estimated total biomass on reference plots in

Kericho and Aberdare respectively. Negative bias means that the method applied underestimated biomass in

the reference data set. It is important to mention that error estimates are only valid for the reference data set

and cannot be used as error estimates for the entire test site. Estimates on higher scale like forest stand level

are more accurate as errors of individual estimation units are balanced out.

Table 11. Statistics of measured and estimated total biomass on sample plots in Kericho.

Total AGB (Mg∙ha

-1) Measured Estimated

Range (min-max) 0-716 3-636

Mean 147 146

SD 157 149

RMSE 48.0

Bias -0.1

Relative RMSE (%) 32.9

Relative bias (%) 0.0

Table 12. Statistics of measured and estimated total biomass on sample plots in Aberdare.

Total AGB (Mg∙ha-1

) Measured Estimated

Range (min-max) 6-1003 0-477

Mean 124 121

SD 147 117

RMSE 71.1

Bias 0.0

Relative RMSE (%) 58.9

Relative bias (%) 0.0

23

Table 13. Statistics of measured and estimated total biomass on sample plots on both test areas.

Total AGB (Mg∙ha-1

) Measured Estimated

Range (min-max) 0-1003 0-477

Mean 140 140

SD 155 141

RMSE 57.9

Bias 0.0

Relative RMSE (%) 41.3

Relative bias (%) 0.0

In Figure 8, Figure 9 and Figure 10, estimated biomass is plotted against biomass calculated from field

measurements in Kericho, Aberdare and taken both test sites into account, respectively.

Figure 8. Scatter plot of measured and estimated AGB on sample plots in Kericho.

24

Figure 9. Scatterplot of measured and estimated AGB on sample plots in Aberdare.

Figure 10. Scatterplot of measured and estimated AGB on sample plots on both test areas.

Rasterized representation of the results is shown in Figure 11. The resolution of the biomass map is 25x25 m,

biomass increases from red to green.

25

Figure 11. False color infrared image and biomass map in Kericho.

7 Discussion

Plot level accuracy of the biomass estimates was unexpected poor. Reasons for that were explored and are

discussed as the following. Visual verification of the biomass maps revealed that the ground truthing was not

completely successful. In Kericho test site most of the forested areas were uniform plantations, however,

biomass estimates for neighboring cells in the same plantation could vary significantly. In Figure 12for the

grid cell around the center 528 Mg/ha biomass was estimated whereas for the neighboring cell to the northeast

only 419 Mg/ha. To verify the similarity of the cells digital canopy model is set as background in Figure 13.

Figure 12. Biomass estimates with false color infrared imagery.

26

Figure 13. Biomass estimates with DCM.

One possible reason for errors of this kind is the uncertainty in the positioning of sample plots. Results

showed that Javad GNSS units could not record the position accurately under dense vegetation cover (see

Figure 14). In the Kericho test site a high number of unthinned, extremely dense coniferous plantations were

found. About one third of the sample plot positions were poor which means that post-processed GNSS records

for the plot center had an average scatter of approximately 7 and 14 meters in East-West and North-South

direction respectively. In some cases all points collected on the same sample plot fell within several square

millimeters (see Figure 15).

Figure 14. Uncertain plot position under dense canopy cover in Kericho.

27

Figure 15. Successful GNSS measurement in Kericho (grid size 1 mm).

It is important to mention, that GNSS data collected in the Aberdare test site was considerably better. This

might be due to the fact that typical vegetation type was bamboo in that area which apparently did not

interfere with GNSS satellite signals (see Figure 16).

Figure 16. Successful GNSS measurement in Aberdare (grid size 5 mm).

28

Figure 17. Grid cells with very light vegetation

Another typical error in the biomass map was that grid cells with nearly no vegetation got inaccurate biomass

estimates, as can be observed in Figure. This is due to the lack of similar sample plots, in this case sample

plots with low biomass values, in the reference data set. Nearest neighbors selected from the reference data

were in fact much too different from the target grid cell which led to overestimation of biomass.

In addition to the issues with positioning and the k-NN method, there were problems concerning the accuracy

of the field estimates. Measurements on shrubs were limited to average height and canopy cover expressed in

4 levels. Review of the literature revealed that field measurement methods and biomass models for African

shrub species are not common. Based on the articles found, and the attributes measured, only a vague estimate

of biomass of shrubs was possible. The accuracy of biomass estimates for shrubs in the ground truth is of great

importance as the biomass of shrubby vegetation can be a significant portion of total above ground biomass in

certain areas. In some sample plots shrubs were the only vegetation cover. On the sample plot shown in Figure

shrub coverage was over 70 % and the average shrub height 2.5 m. There was no other woody vegetation

recorded on the plot.

29

Figure 18. Sample plot in the Kerichotest site covered by shrubs.

There was a significant difference between the RMSE values calculated for the two test sites (33% and 59%).

One reason for the higher RMSE in the Aberdare test site was most probably high variation in vegetation

which was not properly covered in the ground truth. In the Kericho area (approx. 425 km2) a total of 349

sample plots were used (1.22 km2/plot)while in Aberdare (approx. 175 km

2) only 98 plots were usable (1.79

km2/plot).In Kericho the vegetation cover was mostly uniform plantations as opposed to natural vegetation in

Aberdare with more variance, which could not be covered by the small amount of plots measured. Another

reason is plot no. 2141 with extremely high amount of biomass (1003 Mg/ha) caused by one tree on the plot

with DBH over 2 m.

The second highest measured biomass was 525 Mg/ha. This means that using 3 nearest neighbors the estimate

for plot no. 2141 will be heavily underestimated as it is shown in Figure 9 (point in the right of the plot

area).The effect of this outlier was tested by excluding it from the feature selection process. The relative

RMSE decreased from 58.9 % to 41.3 %. Exclusion of outliers from the reference data is however not

advisable. Good practice is to measure a representative set of reference sample plots in the target area. As it

was mentioned in section 0 terrain should be taken into account during sample plot allocation.

In the calculation of 3D features laser pulses were considered to be reflected by vegetation if their above

ground height was not less than 2 meters. This approach is widelyused in boreal forestsbut needs to be

reconsidered in order to match field measurements (minimum height of shrubs, trees to be recorded) to the 3D

features.

8 Future use of the data

The LiDAR data and aerial imagery acquired in the IC-FRA project together with the sample plot data offer

several possibilities for further research. The most interesting subject would be to investigate how RMSE and

bias change by decreasing sample plot radius. The distance of each tallied tree from the plot center was

recorded in the field. This makes it possible to carry out feature selection by using trees, for example, within

12 or 9 meters radius. Smaller plot radius would increase time and cost efficiency of ground truthing and

make it possible to estimate attributes for finer resolution (20x20 m grid).

30

Due to lack of time in the IC-FRA project feature selection and forest attribute estimation was carried out only

for biomass. Other forest attributes correlating with laser and image features, for example, total growing stock

or Lorey’s height could be estimated similarly.

For verification and to get an idea of stand level accuracy, the estimated forest attributes should be compared

with a separate field data collected with, for example, dense systematic sampling. With the help of digital

stand delineation, estimated forest attributes for stands can be calculated as an average of grid cells within

each stand. This way the results would be comparable with already existing data, for example, from the forests

managed by the Kenya Forest College in Londiani, as it was initially planned for. The ALS data acquired

could be utilized also in teaching of forestry students to introduce modern technology in forest data collection.

31

References

http://cs.uef.fi/~lamehtat/documents/lmfor.pdf

Chidumayo, E.N.,2012,Assessment of Existing Models for Biomass and Volume Calculations for Zambia,

Report prepared for FAO-Zambia integrated land use assessment (ILUA) phase II project, 58 p.

Haralick R., Shanmugam K., Dinstein I., 1973, Textural Features for Image Classification, IEEE Trans. on

Systems, Man and Cybernetics, vol. SMC–3, No. 6: 610–621 p.

Henry, M., Besnardd A., Asantee W.A., Eshunf J., Adu-Bredug S., Valentinic R., Bernouxb M., Saint-Andréh

L., 2010, Wood density, phytomass variations within and among trees, and allometric equations in a tropical

rainforest of Africa, Forest Ecology and Management, vol. 260: 1383 p.

Mwakalukwa, E.E., Meilby H., Treue T., 2014, Volume and Aboveground Biomass Models for Dry Miombo

Woodland in Tanzania. International Journal of Forestry Research, vol. 2014, article ID 531256

Næsset, E., 2002, Predicting forest stand characteristics with airborne scanning laser using a practical two-

stage procedure and field data, Remote Sensing of Environment, vol.80: 88-90 p.

Næsset, E., 2002, Predicting forest stand characteristics with airborne scanning laser using a practical two-

stage procedure and field data, Scandinavian Journal of Forest Research, vol19:164-179 p.

Packalén, P., Maltamo, M., 2006, Predicting the plot volume by tree species using airborne laser scanning and

aerial photographs, Forest Science, vol. 52(6), 611–622 p.

Packalén P., Maltamo M., 2008, Estimation of species-specific diameter distributions using airborne laser

scanning and aerial photographs, Canadian Journal of Forest Research, vol.38(7): 1750-1760 p.

Packalén, P., Suvanto, A., Maltamo, M., 2008, A two stage method to estimate species-specific growing stock

by combining ALS data and aerial photographs of known orientation parameters, Manuscript IV in

Dissertationes Forestales 77

Tuominen S., Haapanen R., 2013, Estimation of forest biomass by means of genetic algorithm-based

optimization of airborne laser scanning and digital aerial photograph features, Silva Fennica, vol. 47, no. 1,

article ID 902

Vehmas, M., Peuhkurinen, J., Eerikäinen, K., Packalén, P.,Maltamo, M., 2009, Identification of boreal forest

stands with high herbaceous plant diversity using airborne laser scanning, Forest Ecology and

Management,vol.257: 46-53 p.

Wanene, A.G. 1975, A provisional yield table for Pinus Patula grown in Kenya. FD technical note 143.