grids 1.0 beta and beyond
DESCRIPTION
Grids 1.0 beta and beyond. Andy Turner http://www.geog.leeds.ac.uk/people/a.turner/. Outline. Introduction Detail What Why Memory Handling Next Steps Summary Questions Comments Advice. Introduction. Who are we?. Andy Turner Researcher Computational geographer Java programmer - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/1.jpg)
Grids 1.0 beta and Grids 1.0 beta and beyondbeyondAndy TurnerAndy Turner
http://www.geog.leeds.ac.uk/people/a.turner/http://www.geog.leeds.ac.uk/people/a.turner/
![Page 2: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/2.jpg)
OutlineOutline• Introduction• Detail
– What– Why– Memory Handling
• Next Steps• Summary• Questions Comments Advice
![Page 3: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/3.jpg)
IntroductionIntroduction
![Page 4: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/4.jpg)
Who are we?Who are we?
• Andy Turner– Researcher– Computational geographer– Java programmer– e-Social Science in action!
• You– Similar?– Who else?
![Page 5: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/5.jpg)
What is Grids 1.0 beta in a What is Grids 1.0 beta in a nutshell?nutshell?
• Java for processing numeric 2D Square Celled raster data
• Open Source LGPL research software• Beta 1 released in March 2005• Beta 6 released in March 2006 (latest)• Releases available via
http://www.geog.leeds.ac.uk/people/a.turner/src/java/grids
![Page 6: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/6.jpg)
Why develop Grids?Why develop Grids?
![Page 7: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/7.jpg)
Original MotivationOriginal Motivation
• Learn Java• Other software did not really do what I wanted• Develop a generally useful technology to support
applications• To control
– Numerical accuracy (precision)– Error handling
• To build a component on which other software can be based– Geographically Weighted Statistics (GWS) etc…
![Page 8: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/8.jpg)
What couldn’t other software do?What couldn’t other software do?
• Handle a single raster data layer with hundreds of thousands of rows and columns
• That was in the year 2000
![Page 9: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/9.jpg)
Maybe some (your?) software Maybe some (your?) software could… but anyway…could… but anyway…
![Page 10: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/10.jpg)
……much of this Original Motivation much of this Original Motivation is still reason for developing is still reason for developing
Grids…Grids…
![Page 11: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/11.jpg)
• Java is evolving• Data sets get larger• I still don’t know of any software that can
do the things I want• I still want control over numbers and errors• I am still developing other Java based on
Grids
![Page 12: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/12.jpg)
The state of GridsThe state of Grids• 1.0 because
– API is feature complete (relatively stable)– It has to reach 1.0 at some stage!
• Beta because– Documentation is OK but not great– No unit tests– Not enough good examples
• Used in teaching– A lecture, practical and workshop on a GIS and
Environment module at the University of Leeds• Run annually• Mixed bag of students are a great help
![Page 13: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/13.jpg)
More detailed More detailed description…description…
![Page 14: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/14.jpg)
Package StructurePackage Structure
• core• process
– Sets of methods for particular kinds of processing
• utilities– Generic code
• exchange– For loading and saving Grids
![Page 15: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/15.jpg)
corecore
![Page 16: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/16.jpg)
ChunkChunk
• chunkNRows = 7• chunkNCols = 6• A is a cell
– It has a value like all others in the grid
• This chunk comprises 42 cells
A
![Page 17: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/17.jpg)
GridGrid
• nChunkRows = 7• nChunkCols = 6• A is a chunk
– It is made up of cells• This Grid contains 42
chunks• If these chunks each
had 42 cells this would be 1764 cellsA
![Page 18: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/18.jpg)
Why ChunkWhy Chunk
• Each chunk can be stored optimally using any of a number of data structures
• Each chunk can be readily swapped and re-loaded as needs be– Memory handling
![Page 19: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/19.jpg)
Different types of ChunkDifferent types of Chunk
• 64CellMap• 2DArray• JAI• Map• RAF
![Page 20: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/20.jpg)
64CellMap (1/2)64CellMap (1/2)
• The most sophisticated data structure?• Data stored in a fast, lightweight
implementation of the java.util Collections API– gnu.trove
• TdoublelongHashMap• TintlongHashMap
• The long gives the mapping of the value to the 64 cells in the chunk
![Page 21: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/21.jpg)
64CellMap (2/2)64CellMap (2/2)
• For chunks that contain a single cell value there is a single mapping in the HashMap
• for chunks with 64 different cell values there are 64 mappings in the HashMap
• Iterating over (going through) the keys in the HashMap is necessary to get and set cell values, so generally this works faster for smaller numbers of mappings.
![Page 22: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/22.jpg)
Map type chunks in generalMap type chunks in general
• A mapping of keys (cell values) and values (cell identifiers) is a general way of storing grid data.– Efficient in terms of memory use where a default
value can be set, and if there are only a small number of non-default mappings in the chunk (compared to the number of cells in the chunk)
– Offer the means to generating some statistics about a chunk very efficiently
• the diversity (number of different values)• mode
![Page 23: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/23.jpg)
Factories and IteratorsFactories and Iterators
• Each chunk and grid is associated with a factory and an iterator
• Factories keep things tidy, production can be done in one place and in a controlled way
• Iterators aim to offer the fastest and most efficient way of going through all the values in a grid or chunk– These can be ordered and unordered
![Page 24: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/24.jpg)
Statistics (1/3)Statistics (1/3)
• Attached to every grid is a statistics object• Implemented by every statistics object and
every chunk and grid is a statistics interface
• Abstract classes provide a generic way of returning statistics
• Specific chunks and grids can override these methods to provide faster implementations
![Page 25: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/25.jpg)
Statistics (2/3)Statistics (2/3)
• Two basic types attached to a grid– Updated
• Statistics initialised and kept up to date as underlying data changes
• Better the more often statistics are used– Not updated
• Statistics not initialised or kept up to date as underlying data changes
• Far faster if statistics are not used
![Page 26: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/26.jpg)
Statistics (3/3)Statistics (3/3)
• nonNoDataValueCountBigInteger– number of cells with non noDataValues
• sumBigDecimal– the sum of all non noDataValues
• minBigDecimal– the minimum of all non noDataValues
• minCountBigInteger– the number of min values as a BigInteger
• … maxBigDecimal … maxCountBigInteger
![Page 27: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/27.jpg)
Memory HandlingMemory Handling
![Page 28: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/28.jpg)
/** * OutOfMemoryError Handling Wrapper for methodToProcess(args) * @param args Arguments needed for processing * @param handleOutOfMemoryError * If true then OutOfMemoryErrors are handled in this method by * calling swap(args) prior to recall of this method. * If false then OutOfMemoryErrors are caught and thrown. */public Object[] methodToProcess( Object[] args, boolean handleOutOfMemoryError ) { try { return methodToProcess( args ); } catch ( java.lang.OutOfMemoryError e ) { if ( handleOutOfMemoryError ) { swap(args); return methodToProcessl( args, handleOutOfMemoryError ); } else { throw e; } } }
![Page 29: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/29.jpg)
• Swapping a chunk with values that are needed by the method could leave us in an infinite loop
• Swapping a chunk with values that are needed soon is not efficient if other chunks could have been swapped
• It is difficult to have a generic swap operations for all methods that is efficient
• When processing there can be multiple grids and it can be better to swap chunks in output grids or coincident chunks, or chunks in one grid then the next etc…
• The programmer knows best…
Controlling SwapControlling Swap
![Page 30: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/30.jpg)
processprocess• Key Methods
– addToGrid– aggregation– value replacement
• mask– arithmetic operators
• subtract• multiply• divide
– rescaling– GWS– DEM
![Page 31: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/31.jpg)
GWSGWS• Weighting
– kernels• Normalisation• Multi scale generalisation• 2 main types:
– Univariate• First order
– mean, sum• Second order
– moments (proportions, variance, skewness)– Bivariate
• difference• normalized difference• correlation
![Page 32: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/32.jpg)
DEM extensionDEM extension• Methods
– Hollow or pit detection– Hollow filling– Flow accumulation
• Distributive• Based on all downslope cells not just maximum
– Geomorphometrics• E.g. Slope and aspect• Regional based and weighted like GWS
![Page 33: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/33.jpg)
Processing GridsProcessing Grids
• Simple case– A single input grid and a single numerical
result• Complex case
– Multiple input and output grids– Grids of different
• Sizes• Origins• Orientations
![Page 34: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/34.jpg)
Multiple input and output GridsMultiple input and output Grids
• All Grids hold a reference to a collection of all the Grids– Used for swapping data
![Page 35: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/35.jpg)
More about processing…More about processing…
• Often involves generalising all cell values that lie within specific distances
• Often uses a distance weighting scheme• Often involves producing outputs at the
same resolution as inputs• Often takes hours…• Is the main reason for developing Grids…
![Page 36: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/36.jpg)
Future DirectionsFuture Directions
![Page 37: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/37.jpg)
• Handling different types of cell value– So far int and double type cell values only– Next want boolean and BigDecimal
• Use a virtual file store– To distribute swap across multiple networked
machines– SRB?
• Take advantage of Java 1.5• Organise for parallel processing using MPJ• Enhance suite of geographical analysis methods
![Page 38: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/38.jpg)
• eScience Collaboration with China– Develop as a Grid Service
• Develop unit tests– Key to opening up development?
• Improve documentation
![Page 39: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/39.jpg)
SummarySummary
![Page 40: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/40.jpg)
Grids 1.0 beta is designed to Grids 1.0 beta is designed to handlehandle
• Multiple input and multiple output Grids• Grids with millions of rows and millions of
columns• Numerical data• Grids containing chunks of the same
dimensions
![Page 41: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/41.jpg)
State SummaryState Summary
• Not really taking advantage of Java 1.5– Currently based on JDK 1.4.2
• plus a few handy extras
• Not developing fast• Not abandoned• Not really openly developed
– Users encouraged to feedback and the problems get fixed by me…
![Page 42: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/42.jpg)
Thank You For Your Attention!Thank You For Your Attention!
Questions Advice Comments
http://www.geog.leeds.ac.uk/people/a.turner/
![Page 43: Grids 1.0 beta and beyond](https://reader033.vdocument.in/reader033/viewer/2022042719/56813b2c550346895da3f475/html5/thumbnails/43.jpg)
AcknowledgementsAcknowledgements• The European Commission has supported this work under the
following contracts: – IST-1999-10536 ( SPIN!-project ) – EVK2-CT-2000-00085 ( MedAction ) – EVK2-CT-2001-00109 ( DesertLinks ) – EVK1-CT-2002-00112 ( tempQsim )
• The ESRC has supported this work under: – RES-149-25-0034 ( MoSeS )
• Thank you James MacGill and Ian Turton for making available a version of the GeoTools Raster class Java source code which initially got me going with this package in January 2000.
• Thank you University of Leeds especially the School of Geography and CCG for your support and encouragement over the years.