modeling species distribution with maxent bryce maxell, acting director, montana natural heritage...
TRANSCRIPT
Modeling Species Distribution with MaxEnt
Bryce Maxell, Acting Director, Montana Natural Heritage Program
&Scott Story, Nongame Data Manager, Montana Fish,
Wildlife and Parks
Agenda - Wednesday• 8-9 Introduction to MaxEnt • 9:05-10 Reptile and Amphibian Model Examples• 10:05-11 Installation and Walkthrough of MaxEnt• 11:05-12 Preparation of Data• 12-1 Lunch• 1-1:55 Thresholds & Model Validation• 2-3 Using models in your DSS• 3 - 5 Hands-on Session• Tomorrow 8-11 Hands-on, Data Prep, Questions &
Discussion
• About to start again folks on the phone.
INSTALLATIONInstalling and Running MaxEnt
Download & Install• http://www.cs.princeton.edu/~schapire/maxent/• Current MaxEnt Version = 3.3.3e • Requires Java Version 1.4 or later
• Type java –version at command prompt• http://www.java.com
• Extract the .zip file to a very simple directory– No spaces, no strange characters, short– C:\maxent
• Three files are installed– Maxent.bat– Maxent.jar– Readme.txt
– Download the tutorial Word document
Check Java Version
Set PATH and customize .bat file• My Computer Properties Advanced Environment
Variables System Variables PATH Edit• Add to end of the PATH ;c:\maxent• Change the maxent.bat file
– Change the extension to .txt so that you can edit it with Notepad
– Change line reading java -mx512m -jar maxent.jar to…
– java -mx512m -jar c:\maxent\maxent.jar– Change the extension back to .bat– Note that changing the 512 to another number
allocates more memory
512 Mb = 0.5 Gb1024 = 1 Gb1536 = 1.5 Gb2048 = 2 Gb
BASIC MODELING RUNRunning MaxEnt
Required Inputs
• Species presence localities (“samples”) file
• Environmental feature layers
• Output directory
MaxEnt – Main Screen
Supply presence localities
Supply folder containing
environmental feature layers
Change variable types as necessary
Supply an output directory
Ready to Run
What MaxEnt Does• Reads through each layer to
– Determine type– Create .mxe file for each layer in maxent.cache
• Extracts the random background and sample data– You will get warnings about points that are “missing
some environmental data”• Calculates the gain until a threshold is reached• Creates the output grids for each species (this takes the
longest)• Creates the thumbnail .png images
Time Required
• Ten feature layers (3 categorical)– 46 million pixels
• 2 Species• Intel Core 2 Quad CPU (2.83 GHz)• 4.00 GB RAM• Windows 7• 32-bit Operating System• 512Mb of memory specified
Without maxent.cache = 38 minutesWith maxent.cache = 24 minutes
EXAMINING OUTPUTRunning MaxEnt
Output• plots folder• logfile• maxentResults.csv• For each species
– .asc– .html– .lambdas– _omission.csv– _sampleAverages.csv– _samplePredictions.csv
Logfile• Timestamp• Version of MaxEnt• Samples file name• Warnings• Command line to repeat• Species• Layers• Layertypes• Directories for: samples file, layers, output• Number of samples• Maximum gain
Gain
• Closely related to deviance, a measure of GOF in GAM and GLM
• Starts at zero and heads toward an asymptote• MaxEnt trying to come up with best fit• Average log probability of presence samples
minus a constant• Gain indicates how closely the model is
concentrated around presence samples• Avg likelihood of presence samples = exp(gain)
Gain Examples
• McCown’s Longspur– Resulting gain: 2.275– Average likelihood for presence points = 9.728
• Olive-sided Flycatcher– Resulting gain: 1.297– Average likelihood for presence points = 3.658
• Average likelihood of the presence sample is X times higher than that of a background pixel
Html
• Analysis of omission/commission• Receiver Operating Curve (AUC calculated)• Preset Thresholds• Pictures of the Model• Analysis of Variable Contributions• Raw Outputs
Omission Rate vs. Cumulative Threshold
Receiver Operating Curve
Sample Predictions File
• Coordinates for all points• Test or Training• Predicted values
– Raw– Cumulative– Logistic
• Use this file to calculate deviance• Use samples procedure in ArcMap to extract the
ones and zeros (above threshold or not)
Sample Predictions File
Logistic Ouput
High probability of suitable conditions
Low predicted probability of suitable conditions White dots = training (1059 points or 75%)
Purple dots = test (352 points or 25%)
Viewing Data in ArcMap• Build Raster Attribute Table (Categorical)
– .vat.dbf
• Build Histograms (Classified)– .aux
• Build Pyramids– .rrd– .xml
• For species output grids– Convert ASCII to Raster (Output Data Type = FLOATING)
– Output as .bil (Band interleaved by line)
MORE ADVANCED PARAMETERSRunning MaxEnt
REPLICATE RUNSRunning MaxEnt
BATCH MODERunning MaxEnt
Preparation of Data
Scott Story
Required Inputs
• Species presence localities (“samples”) file
• Environmental feature layers
• Output directory
Getting Feature Data Ready
• Same projection (coordinate system, units, datum)
• Same resolution• Same extent• ESRI ascii format
Two Raster Datasets
Land cover• Source = Montana Natural
Heritage Program• Type = IMAGINE Image• Cell size = 30 meters• Columns & Rows =33005,
24008• Spatial Reference = Montana
State Plane (NAD83)• Pixel Type = Unsigned Integer
(8-bit)
Precipitation• Source = PRISM Climate
Center• Type = ASCII grid• Cell size = 0.0083333333• Columns & Rows = 7025,
3105• Spatial Reference =
undefined (see metadata)• Pixel Type = Signed Integer
(32-bit)
Two Raster Datasets
Land cover Precipitation
Making Rasters Match
• Define coordinate systems for both• Set some environment variables
– Tools Options Geoprocessing Tab Environments
– General Settings: Extent and Snap Raster– Raster Analysis Settings: Cell Size, Mask
• Project Raster– Select target raster to match for output cell size
Precipitation Reprojected & Resampled
• Same exact extent• Same exact number or
rows & columns• Same exact cell size• Real test is…does Maxent
throw any errors?• In this case…it worked!• Getting all your data
layers squared away will take some time!
Deriving New Raster Data - Ruggedness
Types of Environmental Features• Continuous (Quantitative)
– Interval-scale (interval data, order, linear scale)– Ordinal variables (scale unknown-transformed?, rank clear)– Ratio-scale (interval data, ordered, not on linear scale, e. g.
temp on F or C scale)
• Categorical (Qualitative)– Nominal (e.g. gender)– Ordinal (has order, e.g. low to great)– Dummy variables from quantitative (classes)
• Name the ASCII files with CONT or CAT prefix
Preparing Point Data
• Create a separate file for each species• Combine them all\groups of them into one file• Probably want to retain a unique identifier• May want to setup scripts in ArcGIS to extract
presence data• Might also want more control of how background
data is selected• Let’s look at an example script -
ExtractModelInputData.py
Other “Feature” Layers• Masks
– useful if you want to train a model using only a subset of the region
– mask.asc– containing a constant value (1, for
example) in area of interest and no-data values everywhere else.
• Bias– assumption that species
occurrence data are unbiased– good understanding of the spatial
pattern– values should indicate relative
sampling effort
THRESHOLDSRepresenting the output
Logistic Output (Ranges 0-1)
Reclassify with ArcGIS
Preset MaxEnt ThresholdsCumulative Threshold
Logistic Threshold
Fractional Predicted Area
Training Omission Rate
Test Omission Rate
Fixed Cumulative Value 1 1 0.043 0.344 0.002 0.000
Fixed Cumulative Value 5 5 0.172 0.255 0.020 0.020
Fixed Cumulative Value 10 10 0.260 0.210 0.044 0.082
Minimum Training Presence 0.699 0.029 0.365 0.000 0.000
10 Percentile Training Presence 17.522 0.351 0.167 0.099 0.151
Equal Training Sensitivity & Specificity
21.989 0.393 0.149 0.148 0.205
Maximum Training Sensitivity Plus Specificity
9.201 0.248 0.216 0.035 0.065
Equal test sensitivity & specificity 18.603 0.361 0.162 0.106 0.162
Maximum test sensitivity plus specificity
7.729 0.225 0.228 0.029 0.043
Balance Training Omission, Predicted Area, &Threshold Value
1.054 0.047 0.342 0.002 0.000
Equate Entropy of Thresholded & Original Distributions
5.465 0.182 0.250 0.021 0.026
Thresholds – Ends of SpectrumBalance Training Omission, Predicted Area, &Threshold Value
Equal Training Sensitivity & Specificity
MODEL VALIDATIONModel Validation
Validation Metrics
• Receiver Operating Curve – obtained by plotting, for each threshold in this range, the proportion of true positive against the proportion of false positive
• Area Under Curve – computed by computing the area under the above described curve
• Deviance – 2 times the log probability of the test data.• Absolute Validation Index - the proportion of presence
evaluation points falling above the threshold or within the GAP predicted distribution
• Point Biserial Correlation - The correlation between a model’s predictions and presence/absence in test data (regarded as a 01 variable)
_samplePredictions.csv
Discussion Point
Topics Left
• Data Prep• Output• Thresholds• Validation• Batch• Replicates