past)

9
Palaeontologia Electronica http://palaeo-electronica.org Hammer, Øyvind, Harper, David A.T., and Paul D. Ryan, 2001. Past: Paleontological Statistics Software Package for Education and Data Analysis. Palaeontologia Electronica, vol. 4, issue 1, art. 4: 9pp., 178kb. http://palaeo-electronica.org/2001_1/past/issue1_01.htm. PAST: PALEONTOLOGICAL STATISTICS SOFTWARE PACKAGE FOR EDUCATION AND DATA ANALYSIS Øyvind Hammer, David A.T. Harper, and Paul D. Ryan Øyvind Hammer. Paleontological Museum, University of Oslo, Sars gate1, 0562 Oslo, Norway David A. T. Harper. Geological Museum, Øster Voldgade 5-7, University of Copenhagen, DK-1350 Copen- hagen K, Denmark Paul D. Ryan. Department of Geology, National University of Ireland, Galway, Ireland ABSTRACT A comprehensive, but simple-to-use software package for executing a range of standard numerical analysis and operations used in quantitative paleontology has been developed. The program, called PAST (PAleontological STatistics), runs on stan- dard Windows computers and is available free of charge. PAST integrates spread- sheet-type data entry with univariate and multivariate statistics, curve fitting, time- series analysis, data plotting, and simple phylogenetic analysis. Many of the functions are specific to paleontology and ecology, and these functions are not found in stan- dard, more extensive, statistical packages. PAST also includes fourteen case studies (data files and exercises) illustrating use of the program for paleontological problems, making it a complete educational package for courses in quantitative methods. KEY WORDS: Software, data analysis, education Copyright: Palaeontological Association, 22 June 2001 Submission: 28 February 2001 Acceptance: 13 May 2001 INTRODUCTION Even a cursory glance at the recent paleontological literature should convince anyone that quantitative methods in pale- ontology have arrived at last. Neverthe- less, many paleontologists still hesitate in applying such methods to their own data. One of the reasons for this has been the difficulty in acquiring and using appropri- ate data-analysis software. The ‘PALSTAT’ program was developed in the 1980s in order to minimize such obstacles and pro- vide students with a coherent, easy-to-use package that supported a wide range of algorithms while allowing hands-on experi- ence with quantitative methods. The first PALSTAT version was programmed for the BBC microcomputer (Harper and Ryan 1987), while later revisions were made for the PC (Ryan et al. 1995). Incorporating univariate and multivariate statistics and other plotting and analytical functions spe- cific to paleontology and ecology, PAL-

Upload: deyni-lorena

Post on 24-Oct-2014

26 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Past)

Palaeontologia Electronica http://palaeo-electronica.org

PAST: PALEONTOLOGICAL STATISTICS SOFTWARE PACKAGE FOR EDUCATION AND DATA ANALYSIS

Øyvind Hammer, David A.T. Harper, and Paul D. RyanØyvind Hammer. Paleontological Museum, University of Oslo, Sars gate1, 0562 Oslo, Norway David A. T. Harper. Geological Museum, Øster Voldgade 5-7, University of Copenhagen, DK-1350 Copen-hagen K, DenmarkPaul D. Ryan. Department of Geology, National University of Ireland, Galway, Ireland

ABSTRACT

A comprehensive, but simple-to-use software package for executing a range ofstandard numerical analysis and operations used in quantitative paleontology hasbeen developed. The program, called PAST (PAleontological STatistics), runs on stan-dard Windows computers and is available free of charge. PAST integrates spread-sheet-type data entry with univariate and multivariate statistics, curve fitting, time-series analysis, data plotting, and simple phylogenetic analysis. Many of the functionsare specific to paleontology and ecology, and these functions are not found in stan-dard, more extensive, statistical packages. PAST also includes fourteen case studies(data files and exercises) illustrating use of the program for paleontological problems,making it a complete educational package for courses in quantitative methods.

KEY WORDS: Software, data analysis, education

Copyright: Palaeontological Association, 22 June 2001Submission: 28 February 2001 Acceptance: 13 May 2001

INTRODUCTION

Even a cursory glance at the recentpaleontological literature should convinceanyone that quantitative methods in pale-ontology have arrived at last. Neverthe-less, many paleontologists still hesitate inapplying such methods to their own data.One of the reasons for this has been thedifficulty in acquiring and using appropri-ate data-analysis software. The ‘PALSTAT’program was developed in the 1980s inorder to minimize such obstacles and pro-

vide students with a coherent, easy-to-usepackage that supported a wide range ofalgorithms while allowing hands-on experi-ence with quantitative methods. The firstPALSTAT version was programmed for theBBC microcomputer (Harper and Ryan1987), while later revisions were made forthe PC (Ryan et al. 1995). Incorporatingunivariate and multivariate statistics andother plotting and analytical functions spe-cific to paleontology and ecology, PAL-

Hammer, Øyvind, Harper, David A.T., and Paul D. Ryan, 2001. Past: Paleontological Statistics Software Package for Education and Data Analysis. Palaeontologia Electronica, vol. 4, issue 1, art. 4: 9pp., 178kb. http://palaeo-electronica.org/2001_1/past/issue1_01.htm.

Page 2: Past)

Øyvind Hammer, David A. T. Harper, and Paul D. Ryan: PALEONTOLOGICAL STATISTICS SOFTWARE

STAT gained a wide user base amongboth paleontologists and biologists.

After some years of service, however,it was becoming clear that PALSTAT hadto undergo major revision. The DOS-based user interface and an architecturedesigned for computers with minisculememories (by modern standards) wasbecoming an obstacle for most users.Also, the field of quantitative paleontologyhas changed and expanded considerablyin the last 15 years, requiring the imple-mentation of many new algorithms. There-fore, in 1999 we decided to redesign theprogram totally, keeping the general con-cept but without concern for the originalsource code. The new program, calledPAST (PAleontological STatistics) takesfull advantage of the Windows operatingsystem, with a modern, spreadsheet-based, user interface and extensivegraphics. Most PAST algorithms producegraphical output automatically, and thehigh-quality figures can be printed orpasted into other programs. The function-ality has been extended substantially withinclusion of important algorithms in thestandard PAST toolbox. Functions foundin PAST that were not available in PAL-STAT include (but are not limited to) parsi-mony analysis with cladogram plotting,detrended correspondence analysis, prin-cipal coordinates analysis, time-seriesanalysis (spectral and autocorrelation),geometrical analysis (point distributionand Fourier shape analysis), rarefaction,modelling by nonlinear functions (e.g.,logistic curve, sum-of-sines) and quantita-tive biostratigraphy using the unitary asso-ciations method. We believe that thefunctions we have implemented reflect thepresent practice of paleontological dataanalysis, with the exception of some func-tionality that we hope to include in futureversions (e.g., morphometric analysis withlandmark data and more methods for the

validation and correction of diversitycurves).

One of the main ideas behind PAST isto include many functions in a single pro-gram package while providing for a con-sistent user interface. This minimizes timespent on searching for, buying, and learn-ing a new program each time a newmethod is approached. Similar projectsare being undertaken in other fields (e,g.,systematics and morphometry). Oneexample is Wayne Maddison’s ‘Mesquite’package (http://mesquite.biosci.ari-zona.edu/mesquite/mesquite.html).

An important aspect of PALSTAT wasthe inclusion of case studies, includingdata sets designed to illustrate possibleuses of the algorithms. Working throughthese examples allowed the student toobtain a practical overview of the differentmethodologies in a very efficient way.Some of these case studies have beenadjusted and included in PAST, and newcase studies have been added in order todemonstrate the new features. The casestudies are primarily designed as studentexercises for courses in paleontologicaldata analysis. The PAST program, docu-mentation, and case studies are availablefree of charge at http://www.nhm.uio.no/~ohammer/past.

PLOTTING AND BASIC STATISTICS

Graphical plotting functions (see http://www.nhm.uio.no/~ohammer/past/plot.html) in PAST include different typesof graph, histogram, and scatter plots. Theprogram can also produce ternary (trian-gle) plots and survivorship curves.

Descriptive statistics (see http://www.nhm.uio.no/~ohammer/past/univar.html) include minimum, maximum,and mean values, population variance,sample variance, population and samplestandard deviations, median, skewness,and kurtosis.

2

Page 3: Past)

Øyvind Hammer, David A. T. Harper, and Paul D. Ryan: PALEONTOLOGICAL STATISTICS SOFTWARE

For associations or paleocommunitydata, several diversity statistics can becomputed: number of taxa, number of indi-viduals, dominance, Simpson index,Shannon index (entropy), Menhinick’s andMargalef’s richness indices, equitability,and Fisher’s a (Harper 1999).

Rarefaction (Krebs 1989) is a methodfor estimating the number of taxa in asmall sample, when abundance data for alarger sample are given. With this method,the number of taxa in samples of differentsizes can be compared. An example appli-cation of rarefaction in paleontology isgiven by Adrain et al. (2000).

The program also includes standardstatistical tests (see http://www.nhm.uio.no/~ohammer/past/twosets.html) for univariate data, includ-ing: tests for normality (chi-squared andShapiro-Wilk), the F and t tests, one-wayANOVA, χ2 for comparing binned samples,Mann-Whitney’s U test and Kolmogorov-Smirnov association test (non-parametric),and both Spearman’s r and Kendall’s tnon-parametric rank-order tests. Dice andJaccard similarity indices are used forcomparing associations limited toabsence/presence data. The Raup-Crickrandomization method for comparingassociations (Raup and Crick 1979) isalso implemented. Finally, the programcan also compute correlation matrices andperform contingency-table analysis.

MULTIVARIATE ANALYSIS

Paleontological data sets, whetherbased on fossil occurrences or morphol-ogy, often have high dimensionality. PASTincludes several methods for multivariatedata analysis (see http://www.nhm.uio.no/~ohammer/past/multivar.html), includingmethods that are specific to paleontologyand biology.

Principal components analysis (PCA)is a procedure for finding hypothetical vari-

ables (components) that account for asmuch of the variance in a multidimensionaldata set as possible (Davis 1986, Harper1999). These new variables are linearcombinations of the original variables.PCA is a standard method for reducing thedimensionality of morphometric and eco-logical data. The PCA routine finds theeigenvalues and eigenvectors of the vari-ance-covariance matrix or the correlationmatrix. The eigenvalues, giving a measureof the variance accounted for by the corre-sponding eigenvectors (components), aredisplayed together with the percentages ofvariance accounted for by each of thesecomponents. A scatter plot of these dataprojected onto the principal components isprovided, along with the option of includingthe Minimal Spanning Tree, which is theshortest possible set of connected linesjoining all points. This may be used as avisual aid in grouping close points (Harper1999). The component loadings can alsobe plotted. Bruton and Owen (1988)describe a typical morphometrical applica-tion of PCA.

Principal coordinates analysis (PCO)is another ordination method, somewhatsimilar to PCA. The PCO routine finds theeigenvalues and eigenvectors of a matrixcontaining the distances between all datapoints, measured with the Gower distanceor the Euclidean distance. The PCO algo-rithm used in PAST was taken from Davis(1986), which also includes a moredetailed description of the method andexample analysis.

Correspondence analysis (CA) is afurther ordination method, somewhat simi-lar to PCA, but for counted or discretedata. Correspondence analysis can com-pare associations containing counts oftaxa or counted taxa across associations.Also, CA is more suitable if it is expectedthat species have unimodal responses tothe underlying parameters, that is theyfavor a certain range of the parameter and

3

Page 4: Past)

Øyvind Hammer, David A. T. Harper, and Paul D. Ryan: PALEONTOLOGICAL STATISTICS SOFTWARE

become rare under for lower and highervalues (this is in contrast to PCA, thatassumes a linear response). The CA algo-rithm employed in PAST is taken fromDavis (1986), which also includes a moredetailed description of the method andexample analysis. Ordination of both sam-ples and taxa can be plotted in the sameCA coordinate system, whose axes willnormally be interpreted in terms of envi-ronmental parameters (e.g., water depth,type of substrate temperature).

The Detrended Correspondence(DCA) module uses the same ‘reciprocalaveraging’ algorithm as the program Dec-orana (Hill and Gauch 1980). It is special-ized for use on “ecological” data sets withabundance data (taxa in rows, localities incolumns), and it has become a standardmethod for studying gradients in suchdata. Detrending is a type of normalizationprocedure in two steps. The first stepinvolves an attempt to “straighten out”points lying along an arch-like pattern (=Kendall’s Horseshoe). The second stepinvolves “spreading out” the points toavoid artificial clustering at the edges ofthe plot.

Hierarchical clustering routines pro-duce a dendrogram showing how andwhere data points can be clustered (Davis1986, Harper 1999). Clustering is one ofthe most commonly used methods of mul-tivariate data analysis in paleontology.Both R-mode clustering (groupings oftaxa), and Q-mode clustering (groupingvariables or associations) can be carriedout within PAST by transposing the datamatrix. Three different clustering algo-rithms are available: the unweighted pair-group average (UPGMA) algorithm, thesingle linkage (nearest neighbor) algo-rithm, and Ward’s method. The similarity-association matrix upon which the clustersare based can be computed using nine dif-ferent indices: Euclidean distance, correla-tion (using Pearson’s r or Spearman’s ρ,

Bray-Curtis, chord and Morisita indices forabundance data, and Dice, Jaccard, andRaup-Crick indices for presence-absencedata.

Seriation of an absence-presencematrix can be performed using the algo-rithm described by Brower and Kyle(1988). For constrained seriation, columnsshould be ordered according to someexternal criterion (normally stratigraphiclevel) or positioned along a presumed fau-nal gradient. Seriation routines attempt toreorganize the data matrix such that thepresences are concentrated along thediagonal. Also, in the constrained mode,the program runs a ‘Monte Carlo’ simula-tion to determine whether the originalmatrix is more informative than a randommatrix. In the unconstrained mode bothrows and columns are free to move: themethod then amounts to a simple form ofordination.

The degree of separation between tohypothesized groups (e.g., species ormorphs) can be investigated using dis-criminant analysis (Davis 1986). Given twosets of multivariate data, an axis is con-structed that maximizes the differencesbetween the sets. The two sets are thenplotted along this axis using a histogram.The null hypothesis of group means equal-ity is tested using Hotelling’s T2 test.

CURVE FITTING AND TIME-SERIES ANALYSIS

Curve fitting (see http://www.nhm.uio.no/~ohammer/past/fit-ting.html) in PAST includes a range of lin-ear and non-linear functions.

Linear regression can be performedwith two different algorithms: standard(least-squares) regression and the”Reduced Major Axis” method. Least-squares regression keeps the x valuesfixed, and it finds the line that minimizesthe squared errors in the y values.Reduced Major Axis minimizes both the x

4

Page 5: Past)

Øyvind Hammer, David A. T. Harper, and Paul D. Ryan: PALEONTOLOGICAL STATISTICS SOFTWARE

and the y errors simultaneously. Both xand y values can also be log-transformed,in effect fitting the data to the “allometric”function y=10bxa. An allometric slopevalue around 1.0 indicates that an “isomet-ric” fit may be more applicable to the datathan an allometric fit. Values for theregression slope and intercepts, theirerrors, a χ2 correlation value, Pearson’s rcoefficient, and the probability that the col-umns are not correlated are given.

In addition, the sum of up to six sinu-soids (not necessarily harmonicallyrelated) with frequencies specified by theuser, but with unknown amplitudes andphases, can be fitted to bivariate data.This method can be useful for modelingperiodicities in time series, such as annualgrowth cycles or climatic cycles, usually incombination with spectral analysis (seebelow). The algorithm is based on a least-squares criterion and singular valuedecomposition (Press et al. 1992). Fre-quencies can also be estimated by trialand error, by adjusting the frequency sothat amplitude is maximized.

Further, PAST allows fitting of data tothe logistic equation y=a/(1+be-cx), usingLevenberg-Marquardt nonlinear optimiza-tion (Press et al. 1992). The logistic equa-tion can model growth with saturation, andit was used by Sepkoski (1984) todescribe the proposed stabilization ofmarine diversity in the late Palaeozoic.Another option is fitting to the von Berta-lanffy growth equation y=a(1-be-cx). Thisequation is used for modeling growth ofmulti-celled animals (Brown and Rothery1993).

Searching for periodicities in timeseries (data sampled as a function of time)has been an important and controversialsubject in paleontology in the last fewdecades, and we have therefore imple-mented two methods for such analysis inthe program: spectral analysis and auto-correlation. Spectral (harmonic) analysis

of time series can be performed using theLomb periodogram algorithm, which ismore appropriate than the standard FastFourier Transform for paleontological data(which are often unevenly sampled; Presset al. 1992). Evenly-spaced data are ofcourse also accepted. In addition to theplotting of the periodogram, the highestpeak in the spectrum is presented with itsfrequency and power value, together witha probability that the peak could occurfrom random data. The data set can beoptionally detrended (linear componentremoved) prior to analysis. Applicationsinclude detection of Milankovitch cycles inisotopic data (Muller and MacDonald2000) and searching for periodicities indiversity curves (Raup and Sepkoski1984). Autocorrelation (Davis 1986) canbe carried out on evenly sampled tempo-ral-stratigraphical data. A predominantlyzero autocorrelation signifies randomdata—periodicities turn up as peaks.

GEOMETRICAL ANALYSIS

PAST includes some functionality forgeometrical analysis (see http://www.nhm.uio.no/~ohammer/past/mor-pho.html), even if an extensive morpho-metrics module has not yet beenimplemented. We hope to implement moreextensive functionality, such as landmark-based methods, in future versions of theprogram.

The program can plot rose diagrams(polar histograms) of directions. Thesecan be used for plotting current-orientedspecimens, orientations of trackways, ori-entations of morphological features (e.g.,trilobite terrace lines), etc. The meanangle together with Rayleigh’s spread aregiven. Rayleigh’s spread is further testedagainst a random distribution using Ray-leigh’s test for directional data (Davis1986). A χ2 test is also available, giving

5

Page 6: Past)

Øyvind Hammer, David A. T. Harper, and Paul D. Ryan: PALEONTOLOGICAL STATISTICS SOFTWARE

the probability that the directions are ran-domly and evenly distributed.

Point distribution statistics using near-est neighbor analysis (modified from Davis1986) are also provided. The area is esti-mated using the convex hull, which is thesmallest convex polygon enclosing thepoints. The probability that the distributionis random (Poisson process, giving anexponential nearest neighbor distribution)is presented, together with the ‘R’ value.Clustered points give R<1, Poisson pat-terns give R~1, while over-dispersedpoints give R>1. Applications of this mod-ule include spatial ecology (are in-situ bra-chiopods clustered) and morphology (aretrilobite tubercles over-dispersed; seeHammer 2000).

The Fourier shape analysis module(Davis 1986) accepts x-y coordinates digi-tized around an outline. More than oneshape can be analyzed simultaneously.Points do not need to be evenly spaced.The sine and cosine components aregiven for the first ten harmonics, and thecoefficients can then be copied to the mainspreadsheet for further analysis (e.g., byPCA). Elliptic Fourier shape analysis isalso provided (Kuhl and Giardina 1982).For an application of elliptic Fourier shapeanalysis in paleontology, see Renaud et al.(1996).

PHYLOGENETIC ANALYSIS (PARSIMONY)

The cladistics package (see http://www.nhm.uio.no/~ohammer/past/cla-dist.html) in PAST is fully operational, butis lacking comprehensive functionality. Forexample, there is no character reconstruc-tion (plotting of steps on the cladogram).The use of PAST in parsimony analysisshould probably be limited to entry-leveleducation and preliminary investigations.The parsimony algorithms used in PASTare from Kitching et al. (1998).

Character states are coded using inte-gers in the range 0 to 255. The first taxonis treated as the outgroup and will beplaced at the root of the tree. Missing val-ues are coded with a question mark. Thereare four algorithms available for findingshort trees: branch-and-bound (finds allshortest trees), exhaustive (finds all short-est trees, and allows the plotting of tree-length distribution), heuristic nearestneighbor interchange (NNI) and heuristicsubtree pruning and regrafting (SPR).Three different optimality criteria are avail-able: Wagner (reversible and orderedcharacters), Fitch (reversible and unor-dered characters), and Dollo (irreversibleand ordered). Bootstrapping can be per-formed with a given number of replicates.

All shortest (most parsimonious) treescan be viewed. If bootstrapping has beenperformed, a bootstrap value is given atthe root of the subtree specifying eachgroup.

The consensus tree of all shortest(most parsimonious) trees can also beviewed. Two consensus rules are imple-mented: strict (groups must be supportedby all trees) and majority (groups must besupported by more than 50% of the trees).PAST can read and export files in theNEXUS format, making it compatible withpackages such as PAUP and MacClade.

BIOSTRATIGRAPHICAL CORRELATION WITH UNITARY ASSOCIATIONS

Quantitative or semi-quantitativemethods for biostratigraphy are not yet incommon use, except for the relatively sub-jective approach of graphical correlation.Such methods are, however, well devel-oped, and we hope that the inclusion ofone method in PAST will help introducemore paleontologists to this field. We havechosen to implement Unitary Associationsanalysis (see http://www.nhm.uio.no/~ohammer/past/unitary.html) (Guex 1991)

6

Page 7: Past)

Øyvind Hammer, David A. T. Harper, and Paul D. Ryan: PALEONTOLOGICAL STATISTICS SOFTWARE

because of its solid theoretical basis andminimum of statistical assumptions.

The data input consists of a presence-absence matrix with samples in rows andtaxa in columns. Samples belong to a setof sections (localities), where the strati-graphical relationships within each sectionare known. The basic idea is to generate aset of assemblage zones (similar to ‘Oppelzones’) that are optimal in the sense thatthey give maximal stratigraphic resolutionwith a minimum of superpositional contra-dictions. An example of such a contradic-tion would be a section containing speciesA above species B, while assemblage 1(containing species A) is placed belowassemblage 2 (containing species B). Themethod of Unitary Associations is a logicalbut somewhat complicated procedure,consisting of several steps. Its implemen-tation in PAST does not include all the fea-tures found in the standard program,called BioGraph (Savary and Guex 1999),and advanced users are referred to thatpackage.

PAST produces a detailed report ofthe analysis, including maximal cliques,unitary associations, correlation table,reproducibility matrix, contradictionsbetween cliques, biostratigraphic graph,graph of superpositional relationshipsbetween maximal cliques, and strongcomponents (cycles) in the graphs (Guex1991). It is important to inspect theseresults thoroughly in order to assess thequality of the correlation and to improvethe quality of the data, if necessary. Angio-lini and Bucher (1999) give an example ofsuch careful use of the method of UnitaryAssociations.

CASE STUDIES

The fourteen case studies have beendesigned to demonstrate both the use ofdifferent data analysis methods in paleon-tology and the specific use of the functions

in the program. The cases are taken fromsuch diverse fields as morphology, taxon-omy, paleoecology, paleoclimatology, sedi-mentology, extinction studies, andbiostratigraphy. The examples are takenfrom both vertebrate and invertebratepaleontology, and they cover the whole ofthe Phanerozoic. These case studies arewell suited for an introductory course inpaleontological data analysis and havebeen tested in classroom situations. Thecases are organized into four main subjectareas: morphology and taxonomy, bioge-ography and paleoecology, time-seriesanalysis, and biostratigraphy.

Case studies 1-51 involve the descrip-tion and analysis of morphological varia-tion of different sorts, while case study 6targets some phylogenetic problems in agroup of Cambrian trilobites and the mam-mals.

Case Study 1 investigates the externalmorphology of the Permian brachiopodDielasma, developing ontogenic modelsfor the genus and comparing the growthrates and outlines of different samplesfrom in and around a Permian reef com-plex. In a more focused exercise, CaseStudy 2 uses spatial statistics to assessthe mode of distribution of tubercles on thecranidium of the trilobite Paradoxides fromthe middle Cambrian.

Case Study 3 tackles the multivariatemorphometrics of the Ordovician illaenidtrilobite Stenopareia using Principal Com-ponents Analysis (PCA), Principal Coordi-nate Analysis (PCO), cluster anddiscriminant analyses to determine thevalidity of two species from Scandinavia.

1. PE Note: The Case Study files are avail-able from the PE site, and also directly fromthe author. The links below point to theauthor's site, which will, as time and theauthor proceed, contain updates and newerversions. The author’s site is: http://www.nhm.uio.no/~ohammer/past/.

7

Page 8: Past)

Øyvind Hammer, David A. T. Harper, and Paul D. Ryan: PALEONTOLOGICAL STATISTICS SOFTWARE

Case Study 4 demonstrates the use ofElliptic Fourier shape analysis and princi-pal components for detecting changes intrilobite cephalon shape through ontogeny.

In Case Study 5, aspects of the allom-etric growth of the Triassic rhynchosaurScaphonyx are investigated using regres-sion analysis.

Case Study 6 investigates the phylo-genetic structure of the middle CambrianParadoxididae through cladistic analysis,using parsimony analysis and bootstrap-ping. Similar techniques can be applied toa matrix of 20 taxa of mammal; cla-dograms generated by the program canbe compared with a cluster analysis of thedata matrix.

Case studies 7-11 cover aspects ofpaleobiogeography and paleoecology.Case Study 7 analyzes a global dataset oflate Ordovician brachiopod distributions. Aseries of provincial faunas were developedagainst a background of regression andcooler surface waters during the first strikeof the late Ordovician (Hirnantian) glacia-tion. Through the calculation of similarityand distance coefficients together withcluster analysis, these data can be orga-nized into a set of latitudinally controlledprovinces. Seriation helps to develop anyfaunal, possibly climatically generated,gradients within the data structure.

In Case Study 8 faunal changesthrough a well-documented section in theupper Llanvirn rocks of central Wales areinvestigated graphically and by the calcu-lation of diversity, dominance, and relatedparameters for each of ten horizons in thesections. The changes in faunas finger-print environmental shifts through the sec-tion, shadowed by marked changes inlithofacies. This dataset is ripe for consid-erable experimentation.

Case Study 9 involves a re-evaluationof Ziegler’s classic Lower Paleozoicdepth-related communities from theAnglo-Welsh area. Using a range of multi-

variate techniques (similarity and distancecoefficients, cluster analysis, detrendedcorrespondence analysis, and seriation)the reality and mutual relationships ofthese benthic associations can be testedusing a modified dataset.

Case Study 10 discusses some well-known Jurassic shelly faunas fromEngland and France. The integrity andonshore – offshore distribution of six Cor-allian bivalve-dominated communities isinvestigated with diversity measures, clus-ter analysis and detrended correspon-dence analysis.

Case Study 11 completes the analysisof biotic assemblages with an investigationof the direction and orientation of a bed-ding-plane sample of brachiopod shellsfrom the upper Ordovician rocks of Scot-land.

Two cases involve the study of timeseries data. Case Study 12 investigatesthe periodicity of mass extinctions duringthe Permian to Recent time interval usingspectral analysis. A number of diversitycurves can be modeled for the Paleozoicand post-Paleozoic datasets available inFossil Record 2, and turnover rates can beviewed for Phanerozoic biotas.

Case Study 13 addresses the period-icity of oxygen isotope data from ice coresrepresenting the last million years of Earthhistory.

The final case study demonstrates theuse of quantitative biostratigraphical corre-lation with the method of Unitary Associa-tions. Eleven sections from the Eocene ofSlovenia are correlated using alveolinidforaminiferans studied by Drobne.

CONCLUSION

Statistical and other quantitative meth-ods are now very much part of the paleon-tologists’ tool kit. PAST is a free, user-friendly and comprehensive package ofstatistical and graphical algorithms, tailor

8

Page 9: Past)

Øyvind Hammer, David A. T. Harper, and Paul D. Ryan: PALEONTOLOGICAL STATISTICS SOFTWARE

made for the scientific investigation ofpaleontological material. PAST provides awindow on current and future develop-ments in this rapidly evolving researcharea. Together with a simple manual andlinked case histories and datasets, thepackage is an ideal educational aid andfirst-approximation research tool. Plannedfuture developments include extendedfunctionality for morphometrics and theextension of available algorithms withinthe cladistics and unitary associationsmodules.

REFERENCES

Adrain, J.M., Westrop, S.R. and Chatterton, D.E. 2000.Silurian trilobite alpha

diversity and the end-Ordovician mass extinction. Paleo-biology, 26:625-646.

Angiolini, L. and Bucher, H. 1999. Taxonomy and quanti-tative biochronology of

Guadalupian brachiopods from the Khuff Formation,Southeastern Oman.

Geobios, 32:665-699.Brower, J.C. and Kyle, K.M. 1988. Seriation of an original

data matrix as applied to palaeoecology. Lethaia, 21:79-93.Brown, D. and Rothery, P. 1993. Models in biology:

mathematics, statistics and computing. John Wiley &Sons, New York.

Bruton, D.L. and Owen, A.W. 1988. The NorwegianUpper Ordovician illaenid trilobites. Norsk Geolo-gisk Tidsskrift, 68:241-258.

Davis, J.C. 1986. Statistics and Data Analysis in Geol-ogy. John Wiley & Sons, New York.

Guex, J. 1991. Biochronological Correlations. SpringerVerlag, Berlin.

Hammer, Ø. 2000. Spatial organisation of tubercles andterrace lines in Paradoxides forchhammeri - evi-

dence of lateral inhibition. Acta PalaeontologicaPolonica, 45:251-270.

Harper, D.A.T. (ed.). 1999. Numerical Palaeobiology.John Wiley & Sons, New York.

Harper, D.A.T. and Ryan, P.D. 1987. PALSTAT. A statisti-cal package for palaeontologists. Lochee Publica-tions and the Palaeontological Association.

Hill, M.O. and Gauch Jr, H.G. 1980. Detrended Corre-spondence analysis: an improved ordination tech-nique. Vegetation, 42:47-58.

Kitching, I.J., Forey, P.L., Humphries, C.J. and Williams,D.M. 1998. Cladistics. Oxford University Press,Oxford.

Krebs, C.J. 1989. Ecological Methodology. Harper &Row, New York.

Kuhl, F.P. and Giardina, C.R. 1982. Elliptic Fourier analy-sis of a closed contour. Computer Graphics andImage Processing, 18:259-278.

Muller, R.A. and MacDonald, G.J. 2000. Ice ages andastronomical causes: Data, Spectral Analysis, andMechanisms. Springer Praxis, Berlin.

Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flan-nery, B.P. 1992. Numerical Recipes in C. CambridgeUniversity Press, Cambridge.

Raup, D. and Crick, R.E. 1979. Measurement of faunalsimilarity in paleontology. Journal of Paleontology,53:1213-1227.

Raup, D. and Sepkoski, J.J. 1984. Periodicities of extinc-tions in the geologic past. Proceedings of theNational Academy of Science, 81:801-805.

Renaud, S., Michaux, J., Jaeger, J.-J. and Auffray, J.-C.1996. Fourier analysis applied to Stephanomys(Rodentia, Muridae) molars: nonprogressive evolu-tionary pattern in a gradual lineage. Paleobiology,22:255-265.

Ryan, P.D., Harper, D.A.T. and Whalley, J.S. 1995. PAL-STAT, Statistics for palaeontologists. Chapman & Hall(now Kluwer Academic Publishers).

Sepkoski, J.J. 1984. A kinetic model of Phanerozoic tax-onomic diversity. Paleobiology, 10:246-267.

Savary, J. and Guex, J. 1999. Discrete BiochronologicalScales and Unitary Associations: Description of theBioGraph Computer Program. Mémoires de Geolo-gie (Lausanne), 34.

9