OverviewThe Command-Line Interface provides a way to execute a multitude of common operations on
GeoWave data stores without having to use the Programmatic API. It allows users to manage data
stores, indices, statistics, and more. All command options that are marked with * are required for the
command to execute.
ConfigurationThe CLI uses a local configuration file to store sets of data store connection parameters aliased by a
store name. Most GeoWave commands ask for a store name and use the configuration file to determine
which connection parameters should be used. It also stores connection information for GeoServer,
AWS, and HDFS for commands that use those services. This configuration file is generally stored in the
user’s home directory, although an alternate configuration file can be specified when running
commands.
General UsageThe root of all GeoWave CLI commands is the base geowave command.
$ geowave
This will display a list of all available top-level commands along with a brief description of each.
Version
$ geowave --version
The --version flag will display various information about the installed version of GeoWave, including
the version, build arguments, and revision information.
General Flags
These flags can be optionally supplied to any GeoWave command, and should be supplied before the
command itself.
Config File
The --config-file flag causes GeoWave to use an alternate configuration file. The supplied file path
should include the file name (e.g. --config-file /mnt/config.properties). This can be useful if you have
multiple projects that use GeoWave and want to keep the configuration for those data stores separate
from each other.
$ geowave --config-file <path_to_file> <command>
Debug
The --debug flag causes all DEBUG, INFO, WARN, and ERROR log events to be output to the console. By
default, only WARN and ERROR log events are displayed.
$ geowave --debug <command>
Help CommandAdding help before any CLI command will show that command’s options and their defaults.
$ geowave help <command>
For example, using the help command on index add would result in the following output:
$ geowave help index addUsage: geowave index add [options] <store name> <index name> Options: -np, --numPartitions The number of partitions. Default partitions will be 1. Default: 1 -ps, --partitionStrategy The partition strategy to use. Default will be none. Default: NONE Possible Values: [NONE, HASH, ROUND_ROBIN] * -t, --type The type of index, such as spatial, or spatial_temporal
Explain CommandThe explain command is similar to the help command in it’s usage, but shows all options, including
hidden ones. It can be a great way to make sure your parameters are correct before issuing a
command.
$ geowave explain <command>
For example, if you wanted to add a spatial index to a store named test-store but weren’t sure what
all of the options available to you were, you could do the following:
$ geowave explain index add -t spatial test-store spatial-idxCommand: geowave [options] <subcommand> ...
VALUE NEEDED PARAMETER NAMES----------------------------------------------{ } -cf, --config-file,{ } --debug,{ } --version,
Command: add [options]
VALUE NEEDED PARAMETER NAMES----------------------------------------------{ EPSG:4326} -c, --crs,{ false} -fp, --fullGeometryPrecision,{ 7} -gp, --geometryPrecision,{ 1} -np, --numPartitions,{ NONE} -ps, --partitionStrategy,{ false} --storeTime,{ spatial} -t, --type,
Expects: <store name> <index name>Specified:test-store spatial-idx
The output is broken down into two sections. The first section shows all of the options available on the
geowave command. If you wanted to use any of these options, they would need to be specified before
index add. The second section shows all of the options available on the index add command. Some
commands contain options that, when specified, may reveal more options. In this case, the -t spatial
option has revealed some additional configuration options that we could apply to the spatial index.
Another command where this is useful is the store add command, where each data store type specified
by the -t <store_type> option has a different set of configuration options.
Config CommandsCommands that affect the local GeoWave configuration.
Configure AWS
NAME
geowave-config-aws - configure GeoWave CLI for AWS S3 connections
SYNOPSIS
geowave config aws <AWS S3 endpoint URL>
DESCRIPTION
This command creates a local configuration for AWS S3 connections that is used by commands that
interface with S3.
EXAMPLES
Configure GeoWave to use an S3 bucket on us-west-2 called mybucket:
geowave config aws https://s3.us-west-2.amazonaws.com/mybucket
Configure GeoServer
NAME
geowave-config-geoserver - configure GeoWave CLI to connect to a GeoServer instance
SYNOPSIS
geowave config geoserver [options] <GeoServer URL>
DESCRIPTION
This command creates a local configuration for connecting to GeoServer which is used by geoserver or
gs commands.
OPTIONS
-p, --password <password>
GeoServer Password - Can be specified as 'pass:<password>', 'file:<local file containing the
password>', 'propfile:<local properties file containing the password>:<property file key>',
'env:<variable containing the pass>', or stdin
-u, --username <username>
GeoServer User
-ws, --workspace <workspace>
GeoServer Default Workspace
SSL CONFIGURATION OPTIONS
--sslKeyManagerAlgorithm <algorithm>
Specify the algorithm to use for the keystore.
--sslKeyManagerProvider <provider>
Specify the key manager factory provider.
--sslKeyPassword <password>
Specify the password to be used to access the server certificate from the specified keystore file. Can
be specified as pass:<password>, file:<local file containing the password>, propfile:<local
properties file containing the password>:<property file key>, env:<variable containing the
pass>, or stdin.
--sslKeyStorePassword <password>
Specify the password to use to access the keystore file. Can be specified as pass:<password>,
file:<local file containing the password>, propfile:<local properties file containing the
password>:<property file key>, env:<variable containing the pass>, or stdin.
--sslKeyStorePath <path>
Specify the absolute path to where the keystore file is located on system. The keystore contains the
server certificate to be loaded.
--sslKeyStoreProvider <provider>
Specify the name of the keystore provider to be used for the server certificate.
--sslKeyStoreType <type>
The type of keystore file to be used for the server certificate.
--sslSecurityProtocol <protocol>
Specify the Transport Layer Security (TLS) protocol to use when connecting to the server. By
default, the system will use TLS.
--sslTrustManagerAlgorithm <algorithm>
Specify the algorithm to use for the truststore.
--sslTrustManagerProvider <provider>
Specify the trust manager factory provider.
--sslTrustStorePassword <password>
Specify the password to use to access the truststore file. Can be specified as pass:<password>,
file:<local file containing the password>, propfile:<local properties file containing the
password>:<property file key>, env:<variable containing the pass>, or stdin.
--sslTrustStorePath <path>
Specify the absolute path to where truststore file is located on system. The truststore file is used to
validate client certificates.
--sslTrustStoreProvider <provider>
Specify the name of the truststore provider to be used for the server certificate.
--sslTrustStoreType <type>
Specify the type of key store used for the truststore, i.e. JKS (Java KeyStore).
EXAMPLES
Configure GeoWave to use locally running GeoServer:
geowave config geoserver "http://localhost:8080/geoserver"
Configure GeoWave to use GeoServer running on another host:
geowave config geoserver "${HOSTNAME}:8080"
Configure GeoWave to use a particular workspace on a GeoServer instance:
geowave config geoserver -ws myWorkspace "http://localhost:8080/geoserver"
Configure HDFS
NAME
geowave-config-hdfs - configure the GeoWave CLI to connect to HDFS
SYNOPSIS
geowave config hdfs <HDFS DefaultFS URL>
DESCRIPTION
This command creates a local configuration for HDFS connections, which is used by commands that
interface with HDFS.
EXAMPLES
Configure GeoWave to use locally running HDFS:
geowave config hdfs localhost:8020
List Configured Properties
NAME
geowave-config-list - list all configured properties
SYNOPSIS
geowave config list [options]
DESCRIPTION
This command will list all properties in the local configuration. This list can be filtered with a regular
expression using the -f or --filter options. A useful regular expression might be a store name, to see
all of the configured properties for a particular data store.
OPTIONS
-f, --filter <regex>
Filter list by a regular expression.
EXAMPLES
List all configuration properties:
geowave config list
List all configuration properties on a data store called example:
geowave config list -f example
Configure Cryptography Key
NAME
geowave-config-newcryptokey - generate a new security cryptography key for use with configuration
properties
SYNOPSIS
geowave config newcryptokey
DESCRIPTION
This command will generate a new security cryptography key for use with configuration properties.
This is primarily used if there is a need to re-encrypt the local configurations based on a new security
token, should the old one have been compromised.
EXAMPLES
Generate a new cryptography key:
geowave config newcryptokey
Set Configuration Property
NAME
geowave-config-set - sets a property in the local configuration
SYNOPSIS
geowave config set [options] <name> <value>
DESCRIPTION
This command sets a property in the local configuration. This can be used to update a particular
configured property of a data store.
OPTIONS
--password
Specify that the value being set is a password and should be encrypted in the configuration.
EXAMPLES
Update the batch write size of a RocksDB data store named example:
geowave config set store.example.opts.batchWriteSize 1000
Update the password for an Accumulo data store named example:
geowave config set --password store.example.opts.password someNewPassword
Store CommandsCommands for managing GeoWave data stores.
Add Store
NAME
geowave-store-add - Add a data store to the GeoWave configuration
SYNOPSIS
geowave store add [options] <name>
DESCRIPTION
This command adds a new store to the GeoWave configuration. The store name can then be used by
other commands for interfacing with the configured data store.
OPTIONS
-d, --default
Make this the default store in all operations
*-t, --type <arg>
The type of store. A list of available store types can be found using the store listplugins command.
All core data stores have these options:
--gwNamespace <namespace>
The GeoWave namespace. By default, no namespace is used.
--enableServerSideLibrary <enabled>
Enable server-side operations if possible. Default is true.
--enableSecondaryIndexing
If specified, secondary indexing will be used.
--enableVisibility <enabled>
If specified, visibility will be explicitly enabled or disabled. Default is unspecified.
--maxRangeDecomposition <count>
The maximum number of ranges to use when breaking down queries.
--aggregationMaxRangeDecomposition <count>
The maximum number of ranges to use when breaking down aggregation queries.
When the accummulo type option is used, additional options are:
* -i, --instance <instance>
The Accumulo instance ID.
* -u, --user <user>
A valid Accumulo user ID.
-p, --password <password>
The password for the user. Can be specified as pass:<password>, file:<local file containing the
password>, propfile:<local properties file containing the password>:<property file key>,
env:<variable containing the pass>, or stdin.
*-z, --zookeeper <servers>
A comma-separated list of Zookeeper servers that an Accumulo instance is using.
When the hbase type option is used, additional options are:
* -z, --zookeeper <servers>
A comma-separated list of zookeeper servers that an HBase instance is using.
--coprocessorJar <path>
Path (HDFS URL) to the JAR containing coprocessor classes.
--disableVerifyCoprocessors
If specified, disable coprocessor verification, which ensures that coprocessors have been added to
the HBase table prior to executing server-side operations.
--scanCacheSize <size>
The number of rows passed to each scanner (higher values will enable faster scanners, but will use
more memory).
When the redis type option is used, additional options are:
* -a, --address <address>
The address to connect to, such as redis://127.0.0.1:6379.
--compression <compression>
The compression to use. Possible values are snappy, lz4, and none. Default is snappy.
When the rocksdb type option is used, additional options are:
--dir <path>
The directory to read/write to. Defaults to "rocksdb" in the working directory.
--compactOnWrite <enabled>
Whether to compact on every write, if false it will only compact on merge. Default is true.
--batchWriteSize <count>
The size (in records) for each batched write. Anything ⇐ 1 will use synchronous single record
writes without batching. Default is 1000.
When the cassandra type option is used, additional options are:
* --contactPoints <contact points>
A single contact point or a comma delimited set of contact points to connect to the Cassandra
cluster.
--replicas <count>
The number of replicas to use when creating a new keyspace. Default is 3.
--durableWrites <enabled>
Whether to write to commit log for durability, configured only on creation of new keyspace. Default
is true.
--batchWriteSize <count>
The number of inserts in a batch write. Default is 50.
When the dynamodb type option is used, additional options are:
* --endpoint <endpoint>
[REQUIRED (or -region)] The endpoint to connect to.
* --region <region>
[REQUIRED (or -endpoint)] The AWS region to use.
--initialWriteCapacity <count>
The maximum number of writes consumed per second before throttling occurs. Default is 5.
--initialReadCapacity <count>
The maximum number of strongly consistent reads consumed per second before throttling occurs.
Default is 5.
--maxConnections <count>
The maximum number of open http(s) connections active at any given time. Default is 50.
--protocol <protocol>
The protocol to use. Possible values are HTTP or HTTPS, default is HTTPS.
--cacheResponseMetadata <enabled>
Whether to cache responses from AWS. High performance systems can disable this but debugging
will be more difficult. Default is true.
When the kudu type option is used, additional options are:
* --kuduMaster <url>
A URL for the Kudu master node.
When the bigtable type option is used, additional options are:
--projectId <project>
The Bigtable project to connect to. Default is geowave-bigtable-project-id.
--instanceId <instance>
The Bigtable instance to connect to. Default is geowave-bigtable-instance-id.
--scanCacheSize <size>
The number of rows passed to each scanner (higher values will enable faster scanners, but will use
more memory).
EXAMPLES
Add a data store called example that uses a locally running Accumulo instance:
geowave store add -t accumulo --zookeeper localhost:2181 --instance accumulo --user root--password secret example
Add a data store called example that uses a locally running HBase instance:
geowave store add -t hbase --zookeeper localhost:2181 example
Add a data store called example that uses a RocksDB database in the current directory:
geowave store add -t rocksdb example
Describe Store
NAME
geowave-store-describe - List properties of a data store
SYNOPSIS
geowave store describe <store name>
DESCRIPTION
This command displays all configuration properties of a given GeoWave data store.
EXAMPLES
List all configuration properties of the example data store:
geowave store describe example
Clear Store
NAME
geowave-store-clear - Clear ALL data from a GeoWave data store and delete tables
SYNOPSIS
geowave store clear <store name>
DESCRIPTION
This command clears ALL data from a GeoWave store and deletes tables.
EXAMPLES
Clear all data from the example data store:
geowave store clear example
Copy Store
NAME
geowave-store-copy - Copy a data store
SYNOPSIS
geowave store copy <input store name> <output store name>
DESCRIPTION
This command copies all of the data from one data store to another.
EXAMPLES
Copy all data from the example data store to the example_copy data store:
geowave store copy example example_copy
Copy Store with MapReduce
NAME
geowave-store-copymr - Copy a data store using MapReduce
SYNOPSIS
geowave store copymr [options] <input store name> <output store name>
DESCRIPTION
This command copies all of the data from one data store to another using MapReduce.
OPTIONS
* --hdfsHostPort <host>
The HDFS host and port.
* --jobSubmissionHostPort <host>
The job submission tracker host and port.
--maxSplits <count>
The maximum partitions for the input data.
--minSplits <count>
The minimum partitions for the input data.
--numReducers <count>
Number of threads writing at a time. Default is 8.
EXAMPLES
Copy all data from the example data store to the example_copy data store using MapReduce:
geowave store copymr --hdfsHostPort localhost:53000 --jobSubmissionHostPortlocalhost:8032 example example_copy
Copy Store Configuration
NAME
geowave-store-copycfg - Copy and modify existing store configuration
SYNOPSIS
geowave store copycfg [options] <name> <new name> [option_overrides]
DESCRIPTION
This command copies and modifies an existing GeoWave store. It is possible to override configuration
options as you copy by specifying the options after the new name, such as store copycfg old new
--gwNamespace new_namespace. It is important to note that this command does not copy data, only the
data store configuration.
OPTIONS
-d, --default
Makes this the default store in all operations.
EXAMPLES
Copy the example RocksDB data store configuration to example_alt, but with an alternate directory:
geowave store copycfg example example_alt --dir /alternate/directory
List Stores
NAME
geowave-store-list - List all configured data stores
SYNOPSIS
geowave store list
DESCRIPTION
This command displays all configured data stores and their types.
EXAMPLES
List all data stores:
geowave store list
Remove Store
NAME
geowave-store-rm - Removes an existing store from the GeoWave configuration
SYNOPSIS
geowave store rm <store name>
DESCRIPTION
This command removes an existing store from the GeoWave configuration. It does not remove any
data from that store.
EXAMPLES
Remove the example store from the configuration:
geowave store rm example
Store Version
NAME
geowave-store-version - Get the version of GeoWave used by a data store
SYNOPSIS
geowave store version <store name>
DESCRIPTION
This command returns the version of GeoWave used by a data store. This is usually the version
represented by the server-side libraries being used by the data store.
EXAMPLES
Get the version of GeoWave used by the example data store:
geowave store version example
List Store Plugins
NAME
geowave-store-listplugins - List all available store types
SYNOPSIS
geowave store listplugins
DESCRIPTION
This command lists all of the store types that can be added via the store add command.
EXAMPLES
List all store plugins:
Index CommandsCommands for managing GeoWave indices.
Add Index
NAME
geowave-index-add - Add an index to a data store
SYNOPSIS
geowave index add [options] <store name> <index name>
DESCRIPTION
This command creates an index in a data store if it does not already exist.
OPTIONS
-np, --numPartitions <count>
The number of partitions. Default is 1.
-ps, --partitionStrategy <strategy>
The partition strategy to use. Possible values are NONE, HASH, and ROUND_ROBIN, default is NONE.
* -t, --type <type>
The type of index, such as spatial, temporal, or spatial_temporal
When the spatial type option is used, additional options are:
-c --crs <crs>
The native Coordinate Reference System used within the index. All spatial data will be projected
into this CRS for appropriate indexing as needed. Default is EPSG:4326.
-fp, --fullGeometryPrecision
If specified, geometry will be encoded losslessly. Uses more disk space.
-gp, --geometryPrecision <precision>
The maximum precision of the geometry when encoding. Lower precision will save more disk
space when encoding. Possible values are between -8 and 7, default is 7.
--storeTime
If specified, the index will store temporal values. This allows it to slightly more efficiently run
spatial-temporal queries although if spatial-temporal queries are a common use case, a separate
spatial-temporal index is recommended.
When the spatial_temporal type option is used, additional options are:
-c --crs <crs>
The native Coordinate Reference System used within the index. All spatial data will be projected
into this CRS for appropriate indexing as needed. Default is EPSG:4326.
-fp, --fullGeometryPrecision
If specified, geometry will be encoded losslessly. Uses more disk space.
-gp, --geometryPrecision <precision>
The maximum precision of the geometry when encoding. Lower precision will save more disk
space when encoding. Possible values are between -8 and 7, default is 7.
--bias <bias>
The bias of the spatial-temporal index. There can be more precision given to time or space if
necessary. Possible values are TEMPORAL, BALANCED, and SPATIAL, default is BALANCED.
--maxDuplicates <count>
The max number of duplicates per dimension range. The default is 2 per range (for example lines
and polygon timestamp data would be up to 4 because it is 2 dimensions, and line/poly time range
data would be 8).
--period <periodicity>
The periodicity of the temporal dimension. Because time is continuous, it is binned at this interval.
Possible values are MINUTE, HOUR, DAY, WEEK, MONTH, YEAR, and DECADE, default is YEAR.
When the temporal type option is used, additional options are:
--maxDuplicates <count>
The max number of duplicates per dimension range. The default is 2 per range (for example lines
and polygon timestamp data would be up to 4 because it is 2 dimensions, and line/poly time range
data would be 8).
--period <periodicity>
The periodicity of the temporal dimension. Because time is continuous, it is binned at this interval.
Possible values are MINUTE, HOUR, DAY, WEEK, MONTH, YEAR, and DECADE, default is YEAR.
--noTimeRange
If specified, the index will not support time ranges, which can be more efficient.
EXAMPLES
Add a spatial index called spatial_idx with CRS EPSG:3857 to the example data store:
geowave index add -t spatial -c EPSG:3857 example spatial_idx
Add a spatial-temporal index called st_idx with a periodicity of MONTH to the example data store:
geowave index add -t spatial_temporal --period MONTH example st_idx
Compact Index
NAME
geowave-index-compact - Compact all rows for a given index
SYNOPSIS
geowave index compact <store name> <index name>
DESCRIPTION
This command will allow a user to compact all rows for a given index.
EXAMPLES
Compact all rows on the spatial_idx index in the example store:
geowave index compact example spatial_idx
List Indices
NAME
geowave-index-list - Display all indices in a data store
SYNOPSIS
geowave index list <store name>
DESCRIPTION
This command displays all indices in a data store.
EXAMPLES
Display all indices in the example store:
geowave index list example
Remove Index
NAME
geowave-index-rm - Remove an index and all associated data from a data store
SYNOPSIS
geowave index rm <store name> <index name>
DESCRIPTION
This command removes an index and all of its data from a data store.
EXAMPLES
Remove the spatial_idx index from the example store:
geowave index rm example spatial_idx
List Index Plugins
NAME
geowave-index-listplugins - List all available index types
SYNOPSIS
geowave index listplugins
DESCRIPTION
This command lists all of the index types that can be added via the index add command.
EXAMPLES
List all index plugins:
geowave index listplugins
Type CommandsCommands for managing GeoWave types.
List Types
NAME
geowave-type-list - Display all types in a data store
SYNOPSIS
geowave type list <store name>
DESCRIPTION
This command displays all types in a GeoWave data store.
EXAMPLES
Display all types in the example data store:
geowave type list example
Remove Type
NAME
geowave-type-rm - Remove a type and all associated data from a data store
SYNOPSIS
geowave type rm <store name> <type name>
DESCRIPTION
This command removes a type and all associated data from a GeoWave data store.
EXAMPLES
Remove the hail type from the example data store:
geowave type rm example hail
Describe Type
NAME
geowave-type-describe - List attributes of a type in a data store
SYNOPSIS
geowave type describe <store name> <type name>
DESCRIPTION
This command lists attributes of types in a GeoWave data store. For vector types, each attribute and
their class are listed. For raster types, only the tile size is listed.
EXAMPLES
Describe the hail type in the example data store:
geowave type describe example hail
Statistics CommandsCommands to manage GeoWave statistics.
Calculate Stat
NAME
geowave-stat-calc - Calculate a specific statistic in the remote store, given a type name and stat type
SYNOPSIS
geowave stat calc [options] <store name> <type name> <stat type>
DESCRIPTION
This command calculates a specific statistic in the data store, given a type name and statistic type.
OPTIONS
--fieldName
The field name for the statistic, if the statistic is maintained per field.
--auth <authorizations>
The authorizations used for the statistics calculation. By default all authorizations are used.
--json
If specified, output will be formatted in JSON.
EXAMPLES
Calculate the COUNT_DATA statistic on the hail type in the example data store:
geowave stat calc example hail COUNT_DATA
Calculate the numeric range statistic of the AREA attribute of the hail type in the example data store:
geowave stat calc --fieldName AREA example hail FEATURE_NUMERIC_RANGE
List Stats
NAME
geowave-stat-list - Print statistics of a data store to standard output
SYNOPSIS
geowave stat list [options] <store name>
DESCRIPTION
This command prints statistics of a GeoWave data store (and optionally of a single type) to the standard
output.
OPTIONS
--typeName <type>
If specified, only statistics for the given type will be displayed.
--auth <authorizations>
The authorizations used for the statistics calculation. By default all authorizations are used.
--json
If specified, output will be formatted in JSON.
EXAMPLES
List all statistics in the example store:
geowave stat list example
List all statistics for the hail type in the example store in JSON format:
geowave stat list --json --typeName hail example
Compact Stats
NAME
geowave-stat-compact - Combine all statistics in a data store
SYNOPSIS
geowave stat compact <store name>
DESCRIPTION
This command combines all statistics in a GeoWave data store, which can make the data store more
efficient.
EXAMPLES
Compact all statistics in the example data store:
geowave stat compact example
Recalculate Stats
NAME
geowave-stat-recalc - Recalculate the statistics in a data store
SYNOPSIS
geowave stat recalc [options] <store name>
DESCRIPTION
This command recalculates the statistics of an existing GeoWave data store. If a type name is provided
as an options, only the statistics for that type will be recalculated.
OPTIONS
--typeName <type>
If specified, only statistics for the given type will be recalculated.
--auth <authorizations>
The authorizations used for the statistics calculation. By default all authorizations are used.
--json
If specified, output will be formatted in JSON.
EXAMPLES
Recalculate all of the statistics in the example data store:
geowave stat recalc example
Recalculate all of the statistics for the hail type in the example data store:
geowave stat recalc --typeName hail example
Remove Stat
NAME
geowave-stat-rm - Remove a statistic from a data store
SYNOPSIS
geowave stat rm [options] <store name> <type name> <stat type>
DESCRIPTION
This command removes a statistic from a GeoWave data store.
OPTIONS
--fieldName
The field name for the statistic, if the statistic is maintained per field.
--auth <authorizations>
The authorizations used for the statistics calculation. By default all authorizations are used.
--json
If specified, output will be formatted in JSON.
EXAMPLES
Remove the BOUNDING_BOX statistic of the hail type in the example data store:
geowave stat rm example hail BOUNDING_BOX
Ingest CommandsCommands that ingest data directly into GeoWave or stage data to be ingested into GeoWave.
Ingest Local to GeoWave
NAME
geowave-ingest-localToGW - Ingest supported files from the local file system
SYNOPSIS
geowave ingest localToGW [options] <file or directory> <store name> <comma delimitedindex list>
DESCRIPTION
This command runs the ingest code (parse to features, load features to GeoWave) against local file
system content.
OPTIONS
-t, --threads <count>
Number of threads to use for ingest. Default is 1.
-x, --extension <extensions>
Individual or comma-delimited set of file extensions to accept.
-f, --formats <formats>
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all
available ingest formats will be used.
-v, --visibility <visibility>
The visibility of the data ingested. Default is public.
When the avro format is used, additional options are:
--avro.avro
If specified, indicates that the operation should use Avro feature serialization.
--avro.cql <filter>
An optional CQL filter. If specified, only data matching the filter will be ingested.
--avro.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default, all type names will be ingested.
--avro.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--avro.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--avro.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the gdelt format is used, additional options are:
--gdelt.avro
A flag to indicate whether Avro feature serialization should be used.
--gdelt.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gdelt.extended
A flag to indicate whether extended data format should be used.
--gdelt.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gdelt.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gdelt.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gdelt.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geolife format is used, additional options are:
--geolife.avro
A flag to indicate whether Avro feature serialization should be used.
--geolife.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geolife.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames
will be ingested. By default all types will be ingested.
--geolife.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--geolife.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--geolife.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geotools-raster format is used, additional options are:
--geotools-raster.coverage <name>
Coverage name for the raster. Default is the name of the file.
--geotools-raster.crs <crs>
A CRS override for the provided raster file.
--geotools-raster.histogram
If specified, build a histogram of samples per band on ingest for performing band equalization.
--geotools-raster.mergeStrategy <strategy>
The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over
the previous tiles, except where there are no data values. By default none is used.
--geotools-raster.nodata <value>
Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple
are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each
band can have multiple differing no data values if needed.
--geotools-raster.pyramid
If specified, build an image pyramid on ingest for quick reduced resolution query.
--geotools-raster.separateBands
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn appended to it where n is the band’s index.
--geotools-raster.tileSize <size>
The tile size of stored tiles. Default is 256.
When the geotools-vector format is used, additional options are:
--geotools-vector.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geotools-vector.data <fields>
A map of date field names to the date format of the file. Use commas to separate each entry, then the
first : character will separate the field name from the format. Use \, to include a comma in the
format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as
dates with different formats.
--geotools-vector.type <types>
Optional parameter that specifies specific type name(s) from the source file.
When the gpx format is used, additional options are:
--gpx.avro
A flag to indicate whether Avro feature serialization should be used.
--gpx.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gpx.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gpx.maxLength <degrees>
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx
tracks.
--gpx.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gpx.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gpx.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the tdrive format is used, additional options are:
--tdrive.avro
A flag to indicate whether Avro feature serialization should be used.
--tdrive.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--tdrive.typename <types>
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--tdrive.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--tdrive.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--tdrive.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the twitter format is used, additional options are:
--twitter.avro
A flag to indicate whether Avro feature serialization should be used.
--twitter.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--twitter.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--twitter.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--twitter.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--twitter.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
EXAMPLES
Ingest GDELT data from an area around Germany from the gdelt_data directory into a GeoWave data
store called example in the spatial-idx index:
geowave ingest localToGW -f gdelt --gdelt.cql "BBOX(geometry,5.87,47.2,15.04,54.95)"./gdelt_data example spatial-idx
Ingest a shapefile called states.shp into the example data store in the spatial-idx index:
geowave ingest localToGW -f geotools-vector states.shp example spatial-idx
Ingest Kafka to GeoWave
NAME
geowave-ingest-kafkaToGW - Subscribe to a Kafka topic and ingest into GeoWave
SYNOPSIS
geowave ingest kafkaToGW [options] <store name> <comma delimited index list>
DESCRIPTION
This command ingests data from a Kafka topic into GeoWave.
OPTIONS
--autoOffsetReset <offset>
What to do when there is no initial offset in ZooKeeper or if an offset is out of range. If smallest is
used, automatically reset the offset to the smallest offset. If largest is used, automatically reset the
offset to the largest offset. Otherwise, throw an exception to the consumer.
--batchSize <size>
The data will automatically flush after this number of entries. Default is 10,000.
--consumerTimeoutMs <timeout>
By default, this value is -1 and a consumer blocks indefinitely if no new message is available for
consumption. By setting the value to a positive integer, a timeout exception is thrown to the
consumer if no message is available for consumption after the specified timeout value.
--fetchMessageMaxBytes <bytes>
The number of bytes of messages to attempt to fetch for each topic-partition in each fetch request.
These bytes will be read into memory for each partition, so this helps control the memory used by
the consumer. The fetch request size must be at least as large as the maximum message size the
server allows or else it is possible for the producer to send messages larger than the consumer can
fetch.
--groupId <id>
A string that uniquely identifies the group of consumer processes to which this consumer belongs.
By setting the same group id multiple processes indicate that they are all part of the same consumer
group.
* --kafkaprops <file>
Properties file containing Kafka properties.
--reconnectOnTimeout
If specified, when the consumer timeout occurs (based on the kafka property consumer.timeout.ms),
a flush will occur and immediately reconnect.
-x, --extension <extensions>
Individual or comma-delimited set of file extensions to accept.
-f, --formats <formats>
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all
available ingest formats will be used.
-v, --visibility <visibility>
The visibility of the data ingested. Default is public.
--zookeeperConnect <host>
Specifies the ZooKeeper connection string in the form hostname:port where host and port are the
host and port of a ZooKeeper server. To allow connecting through other ZooKeeper nodes when that
ZooKeeper machine is down you can also specify multiple hosts in the form
hostname1:port1,hostname2:port2,hostname3:port3.
When the avro format is used, additional options are:
--avro.avro
If specified, indicates that the operation should use Avro feature serialization.
--avro.cql <filter>
An optional CQL filter. If specified, only data matching the filter will be ingested.
--avro.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default, all type names will be ingested.
--avro.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--avro.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--avro.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the gdelt format is used, additional options are:
--gdelt.avro
A flag to indicate whether Avro feature serialization should be used.
--gdelt.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gdelt.extended
A flag to indicate whether extended data format should be used.
--gdelt.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gdelt.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gdelt.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gdelt.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geolife format is used, additional options are:
--geolife.avro
A flag to indicate whether Avro feature serialization should be used.
--geolife.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geolife.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames
will be ingested. By default all types will be ingested.
--geolife.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--geolife.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--geolife.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geotools-raster format is used, additional options are:
--geotools-raster.coverage <name>
Coverage name for the raster. Default is the name of the file.
--geotools-raster.crs <crs>
A CRS override for the provided raster file.
--geotools-raster.histogram
If specified, build a histogram of samples per band on ingest for performing band equalization.
--geotools-raster.mergeStrategy <strategy>
The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over
the previous tiles, except where there are no data values. By default none is used.
--geotools-raster.nodata <value>
Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple
are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each
band can have multiple differing no data values if needed.
--geotools-raster.pyramid
If specified, build an image pyramid on ingest for quick reduced resolution query.
--geotools-raster.separateBands
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn appended to it where n is the band’s index.
--geotools-raster.tileSize <size>
The tile size of stored tiles. Default is 256.
When the geotools-vector format is used, additional options are:
--geotools-vector.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geotools-vector.data <fields>
A map of date field names to the date format of the file. Use commas to separate each entry, then the
first : character will separate the field name from the format. Use \, to include a comma in the
format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as
dates with different formats.
--geotools-vector.type <types>
Optional parameter that specifies specific type name(s) from the source file.
When the gpx format is used, additional options are:
--gpx.avro
A flag to indicate whether Avro feature serialization should be used.
--gpx.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gpx.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gpx.maxLength <degrees>
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx
tracks.
--gpx.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gpx.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gpx.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the tdrive format is used, additional options are:
--tdrive.avro
A flag to indicate whether Avro feature serialization should be used.
--tdrive.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--tdrive.typename <types>
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--tdrive.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--tdrive.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--tdrive.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the twitter format is used, additional options are:
--twitter.avro
A flag to indicate whether Avro feature serialization should be used.
--twitter.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--twitter.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--twitter.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--twitter.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--twitter.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
Stage Local to HDFS
NAME
geowave-ingest-localToHdfs - Stage supported files in local file system to HDFS
SYNOPSIS
geowave ingest localToHdfs [options] <file or directory> <hdfs host:port> <path to basedirectory to write to>
DESCRIPTION
This command stages supported files in the local file system to HDFS.
OPTIONS
-x, --extension <extensions>
Individual or comma-delimited set of file extensions to accept.
-f, --formats <formats>
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all
available ingest formats will be used.
When the avro format is used, additional options are:
--avro.avro
If specified, indicates that the operation should use Avro feature serialization.
--avro.cql <filter>
An optional CQL filter. If specified, only data matching the filter will be ingested.
--avro.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default, all type names will be ingested.
--avro.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--avro.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--avro.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the gdelt format is used, additional options are:
--gdelt.avro
A flag to indicate whether Avro feature serialization should be used.
--gdelt.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gdelt.extended
A flag to indicate whether extended data format should be used.
--gdelt.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gdelt.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gdelt.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gdelt.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geolife format is used, additional options are:
--geolife.avro
A flag to indicate whether Avro feature serialization should be used.
--geolife.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geolife.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames
will be ingested. By default all types will be ingested.
--geolife.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--geolife.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--geolife.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geotools-raster format is used, additional options are:
--geotools-raster.coverage <name>
Coverage name for the raster. Default is the name of the file.
--geotools-raster.crs <crs>
A CRS override for the provided raster file.
--geotools-raster.histogram
If specified, build a histogram of samples per band on ingest for performing band equalization.
--geotools-raster.mergeStrategy <strategy>
The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over
the previous tiles, except where there are no data values. By default none is used.
--geotools-raster.nodata <value>
Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple
are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each
band can have multiple differing no data values if needed.
--geotools-raster.pyramid
If specified, build an image pyramid on ingest for quick reduced resolution query.
--geotools-raster.separateBands
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn appended to it where n is the band’s index.
--geotools-raster.tileSize <size>
The tile size of stored tiles. Default is 256.
When the geotools-vector format is used, additional options are:
--geotools-vector.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geotools-vector.data <fields>
A map of date field names to the date format of the file. Use commas to separate each entry, then the
first : character will separate the field name from the format. Use \, to include a comma in the
format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as
dates with different formats.
--geotools-vector.type <types>
Optional parameter that specifies specific type name(s) from the source file.
When the gpx format is used, additional options are:
--gpx.avro
A flag to indicate whether Avro feature serialization should be used.
--gpx.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gpx.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gpx.maxLength <degrees>
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx
tracks.
--gpx.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gpx.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gpx.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the tdrive format is used, additional options are:
--tdrive.avro
A flag to indicate whether Avro feature serialization should be used.
--tdrive.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--tdrive.typename <types>
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--tdrive.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--tdrive.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--tdrive.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the twitter format is used, additional options are:
--twitter.avro
A flag to indicate whether Avro feature serialization should be used.
--twitter.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--twitter.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--twitter.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--twitter.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--twitter.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
Stage Local to Kafka
NAME
geowave-ingest-localToKafka - Stage supported files in local file system to a Kafka topic
SYNOPSIS
geowave ingest localToKafka [options] <file or directory>
DESCRIPTION
This command stages supported files in the local file system to a Kafka topic.
OPTIONS
* --kafkaprops <file>
Properties file containing Kafka properties
--metadataBrokerList <brokers>
This is for bootstrapping and the producer will only use it for getting metadata (topics, partitions
and replicas). The socket connections for sending the actual data will be established based on the
broker information returned in the metadata. The format is host1:port1,host2:port2, and the list
can be a subset of brokers or a VIP pointing to a subset of brokers.
--producerType <type>
This parameter specifies whether the messages are sent asynchronously in a background thread.
Valid values are async for asynchronous send and sync for synchronous send. By setting the
producer to async we allow batching together of requests (which is great for throughput) but open
the possibility of a failure of the client machine dropping unsent data.
--requestRequiredAcks <count>
This value controls when a produce request is considered completed. Specifically, how many other
brokers must have committed the data to their log and acknowledged this to the leader?
--retryBackoffMs <time>
The amount of time to wait before attempting to retry a failed produce request to a given topic
partition. This avoids repeated sending-and-failing in a tight loop.
--serializerClass <class>
The serializer class for messages. The default encoder takes a byte[] and returns the same byte[].
-x, --extension <extensions>
Individual or comma-delimited set of file extensions to accept.
-f, --formats <formats>
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all
available ingest formats will be used.
When the avro format is used, additional options are:
--avro.avro
If specified, indicates that the operation should use Avro feature serialization.
--avro.cql <filter>
An optional CQL filter. If specified, only data matching the filter will be ingested.
--avro.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default, all type names will be ingested.
--avro.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--avro.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--avro.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the gdelt format is used, additional options are:
--gdelt.avro
A flag to indicate whether Avro feature serialization should be used.
--gdelt.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gdelt.extended
A flag to indicate whether extended data format should be used.
--gdelt.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gdelt.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gdelt.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gdelt.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geolife format is used, additional options are:
--geolife.avro
A flag to indicate whether Avro feature serialization should be used.
--geolife.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geolife.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames
will be ingested. By default all types will be ingested.
--geolife.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--geolife.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--geolife.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geotools-raster format is used, additional options are:
--geotools-raster.coverage <name>
Coverage name for the raster. Default is the name of the file.
--geotools-raster.crs <crs>
A CRS override for the provided raster file.
--geotools-raster.histogram
If specified, build a histogram of samples per band on ingest for performing band equalization.
--geotools-raster.mergeStrategy <strategy>
The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over
the previous tiles, except where there are no data values. By default none is used.
--geotools-raster.nodata <value>
Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple
are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each
band can have multiple differing no data values if needed.
--geotools-raster.pyramid
If specified, build an image pyramid on ingest for quick reduced resolution query.
--geotools-raster.separateBands
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn appended to it where n is the band’s index.
--geotools-raster.tileSize <size>
The tile size of stored tiles. Default is 256.
When the geotools-vector format is used, additional options are:
--geotools-vector.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geotools-vector.data <fields>
A map of date field names to the date format of the file. Use commas to separate each entry, then the
first : character will separate the field name from the format. Use \, to include a comma in the
format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as
dates with different formats.
--geotools-vector.type <types>
Optional parameter that specifies specific type name(s) from the source file.
When the gpx format is used, additional options are:
--gpx.avro
A flag to indicate whether Avro feature serialization should be used.
--gpx.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gpx.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gpx.maxLength <degrees>
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx
tracks.
--gpx.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gpx.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gpx.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the tdrive format is used, additional options are:
--tdrive.avro
A flag to indicate whether Avro feature serialization should be used.
--tdrive.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--tdrive.typename <types>
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--tdrive.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--tdrive.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--tdrive.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the twitter format is used, additional options are:
--twitter.avro
A flag to indicate whether Avro feature serialization should be used.
--twitter.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--twitter.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--twitter.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--twitter.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--twitter.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
Ingest Local to GeoWave with MapReduce
NAME
geowave-ingest-localToMrGW - Copy supported files from local file system to HDFS and ingest from
HDFS
SYNOPSIS
geowave ingest localToMrGW [options] <file or directory> <hdfs host:port> <path to basedirectory to write to> <store name> <comma delimited index list>
DESCRIPTION
This command copies supported files from local file system to HDFS and then ingests from HDFS.
OPTIONS
--jobtracker <host>
Hadoop job tracker hostname and port in the format hostname:port.
--resourceman <host>
Yarn resource manager hostname and port in the format hostname:port.
-x, --extension <extensions>
Individual or comma-delimited set of file extensions to accept.
-f, --formats <formats>
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all
available ingest formats will be used.
-v, --visibility <visibility>
The visibility of the data ingested. Default is public.
When the avro format is used, additional options are:
--avro.avro
If specified, indicates that the operation should use Avro feature serialization.
--avro.cql <filter>
An optional CQL filter. If specified, only data matching the filter will be ingested.
--avro.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default, all type names will be ingested.
--avro.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--avro.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--avro.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the gdelt format is used, additional options are:
--gdelt.avro
A flag to indicate whether Avro feature serialization should be used.
--gdelt.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gdelt.extended
A flag to indicate whether extended data format should be used.
--gdelt.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gdelt.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gdelt.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gdelt.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geolife format is used, additional options are:
--geolife.avro
A flag to indicate whether Avro feature serialization should be used.
--geolife.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geolife.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames
will be ingested. By default all types will be ingested.
--geolife.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--geolife.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--geolife.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geotools-raster format is used, additional options are:
--geotools-raster.coverage <name>
Coverage name for the raster. Default is the name of the file.
--geotools-raster.crs <crs>
A CRS override for the provided raster file.
--geotools-raster.histogram
If specified, build a histogram of samples per band on ingest for performing band equalization.
--geotools-raster.mergeStrategy <strategy>
The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over
the previous tiles, except where there are no data values. By default none is used.
--geotools-raster.nodata <value>
Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple
are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each
band can have multiple differing no data values if needed.
--geotools-raster.pyramid
If specified, build an image pyramid on ingest for quick reduced resolution query.
--geotools-raster.separateBands
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn appended to it where n is the band’s index.
--geotools-raster.tileSize <size>
The tile size of stored tiles. Default is 256.
When the geotools-vector format is used, additional options are:
--geotools-vector.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geotools-vector.data <fields>
A map of date field names to the date format of the file. Use commas to separate each entry, then the
first : character will separate the field name from the format. Use \, to include a comma in the
format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as
dates with different formats.
--geotools-vector.type <types>
Optional parameter that specifies specific type name(s) from the source file.
When the gpx format is used, additional options are:
--gpx.avro
A flag to indicate whether Avro feature serialization should be used.
--gpx.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gpx.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gpx.maxLength <degrees>
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx
tracks.
--gpx.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gpx.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gpx.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the tdrive format is used, additional options are:
--tdrive.avro
A flag to indicate whether Avro feature serialization should be used.
--tdrive.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--tdrive.typename <types>
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--tdrive.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--tdrive.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--tdrive.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the twitter format is used, additional options are:
--twitter.avro
A flag to indicate whether Avro feature serialization should be used.
--twitter.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--twitter.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--twitter.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--twitter.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--twitter.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
Ingest MapReduce to GeoWave
NAME
geowave-ingest-mrToGW - Ingest supported files that already exist in HDFS
SYNOPSIS
geowave ingest mrToGW [options] <hdfs host:port> <path to base directory to write to><store name> <comma delimited index list>
DESCRIPTION
This command ingests supported files that already exist in HDFS to GeoWave.
OPTIONS
--jobtracker <host>
Hadoop job tracker hostname and port in the format hostname:port.
--resourceman <host>
Yarn resource manager hostname and port in the format hostname:port.
-x, --extension <extensions>
Individual or comma-delimited set of file extensions to accept.
-f, --formats <formats>
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all
available ingest formats will be used.
-v, --visibility <visibility>
The visibility of the data ingested. Default is public.
When the avro format is used, additional options are:
--avro.avro
If specified, indicates that the operation should use Avro feature serialization.
--avro.cql <filter>
An optional CQL filter. If specified, only data matching the filter will be ingested.
--avro.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default, all type names will be ingested.
--avro.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--avro.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--avro.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the gdelt format is used, additional options are:
--gdelt.avro
A flag to indicate whether Avro feature serialization should be used.
--gdelt.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gdelt.extended
A flag to indicate whether extended data format should be used.
--gdelt.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gdelt.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gdelt.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gdelt.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geolife format is used, additional options are:
--geolife.avro
A flag to indicate whether Avro feature serialization should be used.
--geolife.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geolife.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames
will be ingested. By default all types will be ingested.
--geolife.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--geolife.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--geolife.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the geotools-raster format is used, additional options are:
--geotools-raster.coverage <name>
Coverage name for the raster. Default is the name of the file.
--geotools-raster.crs <crs>
A CRS override for the provided raster file.
--geotools-raster.histogram
If specified, build a histogram of samples per band on ingest for performing band equalization.
--geotools-raster.mergeStrategy <strategy>
The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over
the previous tiles, except where there are no data values. By default none is used.
--geotools-raster.nodata <value>
Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple
are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each
band can have multiple differing no data values if needed.
--geotools-raster.pyramid
If specified, build an image pyramid on ingest for quick reduced resolution query.
--geotools-raster.separateBands
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn appended to it where n is the band’s index.
--geotools-raster.tileSize <size>
The tile size of stored tiles. Default is 256.
When the geotools-vector format is used, additional options are:
--geotools-vector.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--geotools-vector.data <fields>
A map of date field names to the date format of the file. Use commas to separate each entry, then the
first : character will separate the field name from the format. Use \, to include a comma in the
format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as
dates with different formats.
--geotools-vector.type <types>
Optional parameter that specifies specific type name(s) from the source file.
When the gpx format is used, additional options are:
--gpx.avro
A flag to indicate whether Avro feature serialization should be used.
--gpx.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--gpx.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--gpx.maxLength <degrees>
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx
tracks.
--gpx.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--gpx.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--gpx.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the tdrive format is used, additional options are:
--tdrive.avro
A flag to indicate whether Avro feature serialization should be used.
--tdrive.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--tdrive.typename <types>
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--tdrive.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--tdrive.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--tdrive.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
When the twitter format is used, additional options are:
--twitter.avro
A flag to indicate whether Avro feature serialization should be used.
--twitter.cql <filter>
A CQL filter, only data matching this filter will be ingested.
--twitter.typename <types>
A comma-delimitted set of type names to ingest, feature types matching the specified type names
will be ingested. By default all types will be ingested.
--twitter.maxVertices <count>
Maximum number of vertices to allow for the feature. Features with over this vertice count will be
discarded.
--twitter.minSimpVertices <count>
Minimum vertex count to qualify for geometry simplification.
--twitter.tolerance <tolerance>
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%).
Default is 0.02.
Ingest Spark to GeoWave
NAME
geowave-ingest-sparkToGW - Ingest supported files that already exist in HDFS or S3 using Spark
SYNOPSIS
geowave ingest sparkToGW [options] <input directory> <store name> <comma delimited indexlist>
DESCRIPTION
This command ingests supported files that already exist in HDFS or S3 using Spark.
OPTIONS
-ho, --hosts <host>
The spark driver host. Default is localhost.
-m, --master <designation>
The spark master designation. Default is local.
-n, --name <name>
The spark application name. Default is Spark Ingest.
-c, --numcores <count>
The number of cores to use.
-e, --numexecutors <count>
The number of executors to use.
-x, --extension <extensions>
Individual or comma-delimited set of file extensions to accept.
-f, --formats <formats>
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all
available ingest formats will be used.
-v, --visibility <visibility>
The visibility of the data ingested. Default is public.
List Ingest Plugins
NAME
geowave-ingest-listplugins - List supported ingest formats
SYNOPSIS
geowave ingest listplugins
DESCRIPTION
This command will list all ingest formats supported by the version of GeoWave being run.
EXAMPLES
List all ingest plugins:
geowave ingest listplugins
Analytic CommandsCommands that run MapReduce or Spark processing to enhance an existing GeoWave dataset.
NOTE
The commands below can also be run as a Yarn or Hadoop API command (i.e.
mapreduce).
For instance, to run the analytic using Yarn:
yarn jar geowave-tools.jar analytic <algorithm> <options> <store>
Density-Based Scan
NAME
geowave-analytic-dbscan - Density-Based Scanner
SYNOPSIS
geowave analytic dbscan [options] <storename>
DESCRIPTION
This command runs a density based scanner analytic on GeoWave data.
OPTIONS
-conf, --mapReduceConfigFile <file>
MapReduce configuration file.
* -hdfsbase, --mapReduceHdfsBaseDir <path>
Fully qualified path to the base directory in HDFS.
* -jobtracker, --mapReduceJobtrackerHostPort <host>
[REQUIRED (or -resourceman)] Hadoop job tracker hostname and port in the format hostname:port.
* -resourceman, --mapReduceYarnResourceManager <host>
[REQUIRED (or -jobtracker)] Yarn resource manager hostname and port in the format
hostname:port.
-hdfs, --mapReduceHdfsHostPort <host>
HDFS hostname and port in the format hostname:port.
--cdf, --commonDistanceFunctionClass <class>
Distance function class that implements org.locationtech.geowave.analytics.distance.DistanceFn.
* --query.typeNames <types>
The comma-separated list of types to query; by default all types are used.
--query.auth <auths>
The comma-separated list of authorizations used during extract; by default all authorizations are
used.
--query.index <index>
The specific index to query; by default one is chosen for each adapter.
* -emx, --extractMaxInputSplit <size>
Maximum HDFS input split size.
* -emn, --extractMinInputSplit <size>
Minimum HDFS input split size.
-eq, --extractQuery <query>
Query
-ofc, --outputOutputFormat <class>
Output format class.
-ifc, --inputFormatClass <class>
Input format class.
-orc, --outputReducerCount <count>
Number of reducers For output.
* -cmi, --clusteringMaxIterations <count>
Maximum number of iterations when finding optimal clusters.
* -cms, --clusteringMinimumSize <size>
Minimum cluster size.
* -pmd, --partitionMaxDistance <distance>
Maximum partition distance.
-b, --globalBatchId <id>
Batch ID.
-hdt, --hullDataTypeId <id>
Data Type ID for a centroid item.
-hpe, --hullProjectionClass <class>
Class to project on to 2D space. Implements org.locationtech.geowave.analytics.tools.Projection.
-ons, --outputDataNamespaceUri <namespace>
Output namespace for objects that will be written to GeoWave.
-odt, --outputDataTypeId <id>
Output Data ID assigned to objects that will be written to GeoWave.
-oop, --outputHdfsOutputPath <path>
Output HDFS file path.
-oid, --outputIndexId <index>
Output index for objects that will be written to GeoWave.
-pdt, --partitionDistanceThresholds <thresholds>
Comma separated list of distance thresholds, per dimension.
-pdu, --partitionGeometricDistanceUnit <unit>
Geometric distance unit (m=meters,km=kilometers, see symbols for javax.units.BaseUnit).
-pms, --partitionMaxMemberSelection <count>
Maximum number of members selected from a partition.
-pdr, --partitionPartitionDecreaseRate <rate>
Rate of decrease for precision(within (0,1]).
-pp, --partitionPartitionPrecision <precision>
Partition precision.
-pc, --partitionPartitionerClass <class>
Index identifier for centroids.
-psp, --partitionSecondaryPartitionerClass <class>
Perform secondary partitioning with the provided class.
EXAMPLES
Run through 5 max iterations (-cmi), with max distance between points as 10 meters (-cms), min HDFS
input split is 2 (-emn), max HDFS input split is 6 (-emx), max search distance is 1000 meters (-pmd),
reducer count is 4 (-orc), the HDFS IPC port is localhost:53000 (-hdfs), the yarn job tracker is at
localhost:8032 (-jobtracker), the temporary files needed by this job are stored in
hdfs:/host:port//user/rwgdrummer (-hdfsbase), the data type used is gpxpoint (-query.typeNames), and the
data store connection parameters are loaded from my_store.
geowave analytic dbscan -cmi 5 -cms 10 -emn 2 -emx 6 -pmd 1000 -orc 4 -hdfslocalhost:53000 -jobtracker localhost:8032 -hdfsbase /user/rwgdrummer --query.typeNamesgpxpoint my_store
EXECUTION
DBSCAN uses GeoWaveInputFormat to load data from GeoWave into HDFS. You can use the extract
query parameter to limit the records used in the analytic.
It iteratively calls Nearest Neighbor to execute a sequence of concave hulls. The hulls are saved into
sequence files written to a temporary HDFS directory, and then read in again for the next DBSCAN
iteration.
After completion, the data is written back from HDFS to Accumulo using a job called the "input load
runner".
Kernel Density Estimate
NAME
geowave-analytic-kde - Kernel Density Estimate
SYNOPSIS
geowave analytic kde [options] <input store name> <output store name>
DESCRIPTION
This command runs a Kernel Density Estimate analytic on GeoWave data.
OPTIONS
* --coverageName <name>
The output coverage name.
* --featureType <type>
The name of the feature type to run a KDE on.
* --minLevel <level>
The minimum zoom level to run a KDE at.
* --maxLevel <level>
The maximum zoom level to run a KDE at.
--minSplits <count>
The minimum partitions for the input data.
--maxSplits <count>
The maximum partitions for the input data.
--tileSize <size>
The size of output tiles.
--cqlFilter <filter>
An optional CQL filter applied to the input data.
--indexName <index>
An optional index to filter the input data.
--outputIndex <index>
An optional index for output data store. Only spatial index type is supported.
--hdfsHostPort <host>
The HDFS host and port.
* --jobSubmissionHostPort <host>
The job submission tracker host and port in the format hostname:port.
EXAMPLES
Perform a Kernel Density Estimation using a local resource manager at port 8032 on the gdeltevent
type. The KDE should be run at zoom levels 5-26 and that the new raster generated should be under
the type name gdeltevent_kde. Finally, the input and output data store is called gdelt.
geowave analytic kde --featureType gdeltevent --jobSubmissionHostPort localhost:8032--minLevel 5 --maxLevel 26 --coverageName gdeltevent_kde gdelt gdelt
Kernel Density Estimate on Spark
NAME
geowave-analytic-kdespark - Kernel Density Estimate using Spark
SYNOPSIS
geowave analytic kdespark [options] <input store name> <output store name>
DESCRIPTION
This command runs a Kernel Density Estimate analytic on GeoWave data using Apache Spark.
OPTIONS
* --coverageName <name>
The output coverage name.
* --featureType <type>
The name of the feature type to run a KDE on.
* --minLevel <level>
The minimum zoom level to run a KDE at.
* --maxLevel <level>
The maximum zoom level to run a KDE at.
--minSplits <count>
The minimum partitions for the input data.
--maxSplits <count>
The maximum partitions for the input data.
--tileSize <size>
The size of output tiles.
--cqlFilter <filter>
An optional CQL filter applied to the input data.
--indexName <index>
An optional index name to filter the input data.
--outputIndex <index>
An optional index for output data store. Only spatial index type is supported.
-n, --name <name>
The Spark application name.
-ho, --host <host>
The Spark driver host.
-m, --master <designation>
The Spark master designation.
EXAMPLES
Perform a Kernel Density Estimation using a local spark cluster on the gdeltevent type. The KDE
should be run at zoom levels 5-26 and that the new raster generated should be under the type name
gdeltevent_kde. Finally, the input and output data store is called gdelt.
geowave analytic kdespark --featureType gdeltevent -m local --minLevel 5 --maxLevel 26--coverageName gdeltevent_kde gdelt gdelt
K-means Jump
NAME
geowave-analytic-kmeansjump - KMeans Clustering using Jump Method
SYNOPSIS
geowave analytic kmeansjump [options] <store name>
DESCRIPTION
This command executes a KMeans Clustering analytic using a Jump Method.
OPTIONS
-conf, --mapReduceConfigFile <file>
MapReduce configuration file.
* -hdfsbase, --mapReduceHdfsBaseDir <path>
Fully qualified path to the base directory in HDFS.
* -jobtracker, --mapReduceJobtrackerHostPort <host>
[REQUIRED (or -resourceman)] Hadoop job tracker hostname and port in the format hostname:port.
* -resourceman, --mapReduceYarnResourceManager <host>
[REQUIRED (or -jobtracker)] Yarn resource manager hostname and port in the format
hostname:port.
-hdfs, --mapReduceHdfsHostPort <host>
HDFS hostname and port in the format hostname:port.
--cdf, --commonDistanceFunctionClass <class>
Distance function class that implements org.locationtech.geowave.analytics.distance.DistanceFn.
* --query.typeNames <types>
The comma-separated list of types to query; by default all types are used.
--query.auth <auths>
The comma-separated list of authorizations used during extract; by default all authorizations are
used.
--query.index <index>
The specific index to query; by default one is chosen for each adapter.
* -emx, --extractMaxInputSplit <size>
Maximum HDFS input split size.
* -emn, --extractMinInputSplit <size>
Minimum HDFS input split size.
-eq, --extractQuery <query>
Query
-ofc, --outputOutputFormat <class>
Output format class.
-ifc, --inputFormatClass <class>
Input format class.
-orc, --outputReducerCount <count>
Number of reducers For output.
-cce, --centroidExtractorClass <class>
Centroid exractor class that implements
org.locationtech.geowave.analytics.extract.CentroidExtractor.
-cid, --centroidIndexId <index>
Index to use for centroids.
-cfc, --centroidWrapperFactoryClass <class>
A factory class that implements
org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory.
-czl, --centroidZoomLevel <level>
Zoom level for centroids.
-cct, --clusteringConverganceTolerance <tolerance>
Convergence tolerance.
* -cmi, --clusteringMaxIterations <count>
Maximum number of iterations when finding optimal clusters.
-crc, --clusteringMaxReducerCount <count>
Maximum clustering reducer count.
* -zl, --clusteringZoomLevels <count>
Number of zoom levels to process.
-dde, --commonDimensionExtractClass <class>
Dimension extractor class that implements
org.locationtech.geowave.analytics.extract.DimensionExtractor.
-ens, --extractDataNamespaceUri <namespace>
Output data namespace URI.
-ede, --extractDimensionExtractClass <class>
Class to extract dimensions into a simple feature output.
-eot, --extractOutputDataTypeId <type>
Output data type ID.
-erc, --extractReducerCount <count>
Number of reducers For initial data extraction and de-duplication.
-b, --globalBatchId <id>
Batch ID.
-pb, --globalParentBatchId <id>
Parent Batch ID.
-hns, --hullDataNamespaceUri <namespace>
Data type namespace for a centroid item.
-hdt, --hullDataTypeId <type>
Data type ID for a centroid item.
-hid, --hullIndexId <index>
Index to use for centroids.
-hpe, --hullProjectionClass <class>
Class to project on to 2D space. Implements org.locationtech.geowave.analytics.tools.Projection.
-hrc, --hullReducerCount <count>
Centroid reducer count.
-hfc, --hullWrapperFactoryClass <class>
Class to create analytic item to capture hulls. Implements
org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory.
* -jkp, --jumpKplusplusMin <value>
The minimum K when K-means parallel takes over sampling.
* -jrc, --jumpRangeOfCentroids <ranges>
Comma-separated range of centroids (e.g. 2,100).
EXAMPLES
The minimum clustering iterations is 15 (-cmi), the zoom level is 1 (-zl), the maximum HDFS input split
is 4000 (-emx), the minimum HDFS input split is 100 (-emn), the temporary files needed by this job are
stored in hdfs:/host:port/user/rwgdrummer/temp_dir_kmeans (-hdfsbase), the HDFS IPC port is
localhost:53000 (-hdfs), the yarn job tracker is at localhost:8032 (-jobtracker), the type used is 'hail'
(query.typeNames), the minimum K for K-means parallel sampling is 3 (-jkp), the comma separated
range of centroids is 4,8 (-jrc), and the data store parameters are loaded from my_store.
geowave analytic kmeansjump -cmi 15 -zl 1 -emx 4000 -emn 100 -hdfsbase/usr/rwgdrummer/temp_dir_kmeans -hdfs localhost:53000 -jobtracker localhost:8032--query.typeNames hail -jkp 3 -jrc 4,8 my_store
EXECUTION
KMeansJump uses most of the same parameters from KMeansParallel. It tries every K value given (-
jrc) to find the value with least entropy. The other value, jkp, will specify which K values should use K-
means parallel for sampling versus a single sampler (which uses a random sample). For instance, if
you specify 4,8 for jrc and 6 for jkp, then K=4,5 will use the K-means parallel sampler, while 6,7,8 will
use the single sampler.
KMeansJump executes by executing several iterations, running the sampler (described above, which
also calls the normal K-means algorithm to determine centroids) and then executing a K-means
distortion job, which calculates the entropy of the calculated centroids.
Look at the EXECUTION documentation for the kmeansparallel command for discussion of output,
tolerance, and performance variables.
K-means Parallel
NAME
geowave-analytic-kmeansparallel - K-means Parallel Clustering
SYNOPSIS
geowave analytic kmeansparallel [options] <store name>
DESCRIPTION
This command executes a K-means Parallel Clustering analytic.
OPTIONS
-conf, --mapReduceConfigFile <file>
MapReduce configuration file.
* -hdfsbase, --mapReduceHdfsBaseDir <path>
Fully qualified path to the base directory in HDFS.
* -jobtracker, --mapReduceJobtrackerHostPort <host>
[REQUIRED (or -resourceman)] Hadoop job tracker hostname and port in the format hostname:port.
* -resourceman, --mapReduceYarnResourceManager <host>
[REQUIRED (or -jobtracker)] Yarn resource manager hostname and port in the format
hostname:port.
-hdfs, --mapReduceHdfsHostPort <host>
HDFS hostname and port in the format hostname:port.
--cdf, --commonDistanceFunctionClass <class>
Distance function class that implements org.locationtech.geowave.analytics.distance.DistanceFn.
* --query.typeNames <types>
The comma-separated list of types to query; by default all types are used.
--query.auth <auths>
The comma-separated list of authorizations used during extract; by default all authorizations are
used.
--query.index <index>
The specific index to query; by default one is chosen for each adapter.
* -emx, --extractMaxInputSplit <size>
Maximum HDFS input split size.
* -emn, --extractMinInputSplit <size>
Minimum HDFS input split size.
-eq, --extractQuery <query>
Query
-ofc, --outputOutputFormat <class>
Output format class.
-ifc, --inputFormatClass <class>
Input format class.
-orc, --outputReducerCount <count>
Number of reducers For output.
-cce, --centroidExtractorClass <class>
Centroid exractor class that implements
org.locationtech.geowave.analytics.extract.CentroidExtractor.
-cid, --centroidIndexId <index>
Index to use for centroids.
-cfc, --centroidWrapperFactoryClass <class>
A factory class that implements
org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory.
-czl, --centroidZoomLevel <level>
Zoom level for centroids.
-cct, --clusteringConverganceTolerance <tolerance>
Convergence tolerance.
* -cmi, --clusteringMaxIterations <count>
Maximum number of iterations when finding optimal clusters.
-crc, --clusteringMaxReducerCount <count>
Maximum clustering reducer count.
* -zl, --clusteringZoomLevels <count>
Number of zoom levels to process.
-dde, --commonDimensionExtractClass <class>
Dimension extractor class that implements
org.locationtech.geowave.analytics.extract.DimensionExtractor.
-ens, --extractDataNamespaceUri <namespace>
Output data namespace URI.
-ede, --extractDimensionExtractClass <class>
Class to extract dimensions into a simple feature output.
-eot, --extractOutputDataTypeId <type>
Output data type ID.
-erc, --extractReducerCount <count>
Number of reducers For initial data extraction and de-duplication.
-b, --globalBatchId <id>
Batch ID.
-pb, --globalParentBatchId <id>
Parent Batch ID.
-hns, --hullDataNamespaceUri <namespace>
Data type namespace for a centroid item.
-hdt, --hullDataTypeId <type>
Data type ID for a centroid item.
-hid, --hullIndexId <index>
Index to use for centroids.
-hpe, --hullProjectionClass <class>
Class to project on to 2D space. Implements org.locationtech.geowave.analytics.tools.Projection.
-hrc, --hullReducerCount <count>
Centroid reducer count.
-hfc, --hullWrapperFactoryClass <class>
Class to create analytic item to capture hulls. Implements
org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory.
* -sxs, --sampleMaxSampleSize <size>
Maximum sample size.
* -sms, --sampleMinSampleSize <size>
Minimum sample size.
* -ssi, --sampleSampleIterations <count>
Minimum number of sample iterations.
EXAMPLES
The minimum clustering iterations is 15 (-cmi), the zoom level is 1 (-zl), the maximum HDFS input split
is 4000 (-emx), the minimum HDFS input split is 100 (-emn), the temporary files needed by this job are
stored in hdfs:/host:port/user/rwgdrummer/temp_dir_kmeans (-hdfsbase), the HDFS IPC port is
localhost:53000 (-hdfs), the Yarn job tracker is at localhost:8032 (-jobtracker), the type used is 'hail' (
-query.typeNames), the minimum sample size is 4 (-sms, which is kmin), the maximum sample size is 8 (
-sxs, which is kmax), the minimum number of sampling iterations is 10 (-ssi), and the data store
parameters are loaded from my_store.
geowave analytic kmeansparallel -cmi 15 -zl 1 -emx 4000 -emn 100 -hdfsbase/usr/rwgdrummer/temp_dir_kmeans -hdfs localhost:53000 -jobtracker localhost:8032--query.typeNames hail -sms 4 -sxs 8 -ssi 10 my_store
EXECUTION
K-means parallel tries to identify the optimal K (between -sms and -sxs) for a set of zoom levels (1 →-zl). When the zoom level is 1, it will perform a normal K-means and find K clusters. If the zoom level
is 2 or higher, it will take each cluster found, and then try to create sub-clusters (bounded by that
cluster), identifying a new optimal K for that sub-cluster. As such, without powerful infrastucture, this
approach could take a significant amount of time to complete with zoom levels higher than 1.
K-means parallel executes by first executing an extraction and de-duplication on data received via
GeoWaveInputFormat. The data is copied to HDFS for faster processing. The K-sampler job is used to pick
sample centroid points. These centroids are then assigned a cost, and then weak centroids are stripped
before the K-sampler is executed again. This process iterates several times, before the best centroid
locations are found, which are fed into the real K-means algorithm as initial guesses. K-means iterates
until the tolerance is reached (-cct, which defaults to 0.0001) or the max iterations is met (-cmi).
After execution, K-means parallel writes the centroids to an output data type (-eot, defaults to
centroid), and then creates an informational set of convex hulls which you can plot in GeoServer to
visually identify cluster groups (-hdt, defaults to convex_hull).
For tuning performance, you can set the number of reducers used in each step. Extraction/dedupe
reducer count is -crc, clustering reducer count is -erc, convex Hull reducer count is -hrc, and output
reducer count is -orc).
If you would like to run the algorithm multiple times, it may be useful to set the batch id (-b), which
can be used to distinguish between multiple batches (runs).
K-means on Spark
NAME
geowave-analytic-kmeansspark - K-means Clustering via Spark ML
SYNOPSIS
geowave analytic kmeansspark [options] <input store name> <output store name>
DESCRIPTION
This command executes a K-means clustering analytic via Spark ML.
OPTIONS
-ct, --centroidType <type>
Feature type name for centroid output. Default is kmeans-centroids.
-ch, --computeHullData
If specified, hull count, area, and density will be computed.
--cqlFilter <filter>
An optional CQL filter applied to the input data.
-e, --epsilon <tolerance>
The convergence tolerance.
-f, --featureType <type>
Feature type name to query.
-ht, --hullType <type>
Feature type name for hull output. Default is kmeans-hulls.
-h, --hulls
If specified, convex hulls will be generated.
-ho, --host <host>
The spark driver host. Default is localhost.
-m, --master <designation>
The spark master designation. Default is yarn.
--maxSplits <count>
The maximum partitions for the input data.
--minSplits <count>
The minimum partitions for the input data.
-n, --name <name>
The Spark application name. Default is KMeans Spark.
-k, --numClusters <count>
The number of clusters to generate. Default is 8.
-i, --numIterations <count>
The number of iterations to run. Default is 20.
-t, --useTime
If specified, the time field from the input data will be used.
EXAMPLES
Perform a K-means analytic on a local spark cluster on the hail type in the my_store data store and
output the results to the same data store:
geowave analytic kmeansspark -m local -f hail my_store my_store
Nearest Neighbor
NAME
geowave-analytic-nn - Nearest Neighbors
SYNOPSIS
geowave analytic nn [options] <store name>
DESCRIPTION
This command executes a Nearest Neighbors analytic. This is similar to DBScan, with less arguments.
Nearest neighbor just dumps all near neighbors for every feature to a list of pairs. Most developers will
want to extend the framework to add their own extensions.
OPTIONS
-conf, --mapReduceConfigFile <file>
MapReduce configuration file.
* -hdfsbase, --mapReduceHdfsBaseDir <path>
Fully qualified path to the base directory in HDFS.
* -jobtracker, --mapReduceJobtrackerHostPort <host>
[REQUIRED (or -resourceman)] Hadoop job tracker hostname and port in the format hostname:port.
* -resourceman, --mapReduceYarnResourceManager <host>
[REQUIRED (or -jobtracker)] Yarn resource manager hostname and port in the format
hostname:port.
-hdfs, --mapReduceHdfsHostPort <host>
HDFS hostname and port in the format hostname:port.
--cdf, --commonDistanceFunctionClass <class>
Distance function class that implements org.locationtech.geowave.analytics.distance.DistanceFn.
* --query.typeNames <types>
The comma-separated list of types to query; by default all types are used.
--query.auth <auths>
The comma-separated list of authorizations used during extract; by default all authorizations are
used.
--query.index <index>
The specific index to query; by default one is chosen for each adapter.
* -emx, --extractMaxInputSplit <size>
Maximum HDFS input split size.
* -emn, --extractMinInputSplit <size>
Minimum HDFS input split size.
-eq, --extractQuery <query>
Query
-ofc, --outputOutputFormat <class>
Output format class.
-ifc, --inputFormatClass <class>
Input format class.
-orc, --outputReducerCount <count>
Number of reducers For output.
* -oop, --outputHdfsOutputPath <path>
Output HDFS file path.
-pdt, --partitionDistanceThresholds <thresholds>
Comma separated list of distance thresholds, per dimension.
-pdu, --partitionGeometricDistanceUnit <unit>
Geometric distance unit (m=meters,km=kilometers, see symbols for javax.units.BaseUnit).
* -pmd, --partitionMaxDistance <distance>
Maximum partition distance.
-pms, --partitionMaxMemberSelection <count>
Maximum number of members selected from a partition.
-pp, --partitionPartitionPrecision <precision>
Partition precision.
-pc, --partitionPartitionerClass <class>
Perform primary partitioning for centroids with the provided class.
-psp, --partitionSecondaryPartitionerClass <class>
Perform secondary partitioning for centroids with the provided class.
EXAMPLES
The minimum HDFS input split is 2 (-emn), maximum HDFS input split is 6 (-emx), maximum search
distance is 1000 meters (-pmd), the sequence file output directory is
hdfs://host:port/user/rwgdrummer_out, reducer count is 4 (-orc), the HDFS IPC port is localhost:53000 (
-hdfs), the Yarn job tracker is at localhost:8032 (-jobtracker), the temporary files needed by this job are
stored in hdfs:/host:port//user/rwgdrummer (-hdfsbase), the input type is gpxpoint (-query.typeNames),
and the data store parameters are loaded from my_store.
geowave analytic nn -emn 2 -emx 6 -pmd 1000 -oop /user/rwgdrummer_out -orc 4 -hdfslocalhost:53000 -jobtracker localhost:8032 -hdfsbase /user/rwgdrummer --query.typeNamesgpxpoint my_store
EXECUTION
To execute nearest neighbor search in GeoWave, we use the concept of a "partitioner" to partition all
data on the hilbert curve into square segments for the purposes of parallelizing the search.
The default partitioner will multiply this value by 2 and use that for the actual partition sizes. Because
of this, the terminology is a bit confusing, but the -pmd option is actually the most important variable
here, describing the max distance for a point to be considered a neighbor to another point.
Spark SQL
NAME
geowave-analytic-sql - SparkSQL queries
SYNOPSIS
geowave analytic sql [options] <sql query>
DESCRIPTION
This command executes a Spark SQL query against a given data store, e.g. select * from <store
name>[|<type name>] where <condition>. An alternate way of querying vector data is by using the
vector query command, which does not use Spark, but provides a more robust set of querying
capabilities.
OPTIONS
-n, --name <name>
The Spark application name. Default is GeoWave Spark SQL.
-ho, --host <host>
The Spark driver host. Default is localhost.
-m, --master <designation>
The Spark master designation. Default is yarn.
--csv <file>
The output CSV file name.
--out <store name>
The output data store name.
--outtype <type>
The output type to output results to.
-s, --show <count>
Number of result rows to display. Default is 20.
EXAMPLES
Select all features from the hail type in the my_store data store using a local Spark cluster:
geowave analytic sql -m local "select * from my_store|hail"
Spark Spatial Join
NAME
geowave-analytic-spatialjoin - Spatial join using Spark
SYNOPSIS
geowave analytic spatialjoin [options] <left store name> <right store name> <output storename>
DESCRIPTION
This command executes a spatial join, taking two input types and outputting features from each side
that match a given predicate.
OPTIONS
-n, --name <name>
The Spark application name. Default is GeoWave Spark SQL.
-ho, --host <host>
The Spark driver host. Default is localhost.
-m, --master <designation>
The Spark master designation. Default is yarn.
-pc, --partCount <count>
The default partition count to set for Spark RDDs. Should be big enough to support the largest RDD
that will be used. Sets spark.default.parallelism.
-lt, --leftTypeName <type>
Feature type name of left store to use in join.
-ol, --outLeftTypeName <type>
Feature type name of left join results.
-rt, --rightTypeName <type>
Feature type name of right store to use in join.
-or, --outRightTypeName <type>
Feature type name of right join results.
-p, --predicate <predicate>
Name of the UDF function to use when performing spatial join. Default is GeomIntersects.
-r, --radius <radius>
Used for distance join predicate and other spatial operations that require a scalar radius. Default is
0.01.
-not, --negative
Used for testing a negative result from geometry predicate. i.e GeomIntersects() == false.
EXAMPLES
Using a local Spark cluster, join all features from a hail data type in the my_store store that intersect
features from a boundary type in the other_store store and output the left results to left and right types
in the my_store data store.
geowave analytic spatialjoin -m local -lt hail -rt boundary -ol left -or right my_storeother_store my_store
Vector CommandsCommands that operate on vector data.
Query
NAME
geowave-vector-query - Query vector data using GeoWave Query Language
SYNOPSIS
geowave vector query [options] <query>
DESCRIPTION
This command queries vector data using an SQL-like syntax. The query language currently only
supports SELECT and DELETE statements.
The syntax for SELECT statements is as follows:
SELECT <attributes> FROM <storeName>.<typeName> [ WHERE CQL(<cqlFilter>) ]
Where <attributes> is a comma-separated list of column selectors or aggregation functions,
<storeName> is the data store name, <typeName> is the type name, and <cqlFilter> is a CQL filter to filter
the results by.
The syntax for DELETE statements is as follows:
DELETE FROM <storeName>.<typeName> [ WHERE CQL(<cqlFilter>) ]
Where <storeName> is the data store name, <typeName> is the type name, and <cqlFilter> is the filter to
delete results by.
OPTIONS
--debug
If specified, print out additional info for debug purposes.
-f, --format <format>
Output format for query results. Possible values are console, csv, shp, and geojson. Both shp and
geojson formats require that the query results contain at least 1 geometry column. Default is
console.
When the csv format is used, additional options are:
* -o, --outputFile <file>
CSV file to output query results to.
When the shp format is used, additional options are:
* -o, --outputFile <file>
Shapefile to output query results to.
-t, --typeName <name>
Output feature type name.
When the geojson format is used, additional options are:
* -o, --outputFile <file>
GeoJson file to output query results to.
-t, --typeName <name>
Output feature type name.
EXAMPLES
Calculate the total population of countries that intersect a bounding box that covers a region of
Europe:
geowave vector query "SELECT SUM(population) FROM example.countries WHERE CQL(BBOX(geom,7, 46, 23, 51))"
Select only countries that have a population over 100 million:
geowave vector query "SELECT * FROM example.countries WHERE CQL(population>100000000)"
Output country names and populations to a CSV file:
geowave vector query -f csv -o myfile.csv "SELECT name, population FROMexample.countries"
CQL Delete
NAME
geowave-vector-cqldelete - Delete data that matches a CQL filter
SYNOPSIS
geowave vector cqldelete [options] <store name>
DESCRIPTION
This command deletes all data in a data store that matches a CQL filter.
OPTIONS
--typeName <type>
The type to delete data from.
* --cql <filter>
All data that matches the CQL filter will be deleted.
--debug
If specified, print out additional info for debug purposes.
--indexName <index>
The name of the index to delete from.
EXAMPLES
Delete all data from the hail type in the example data store that lies within the given bounding box:
geowave vector cqldelete --typeName hail --cql "BBOX(geom, 7, 46, 23, 51)" example
Local Export
NAME
geowave-vector-localexport - Export vector data from a data store to Avro
SYNOPSIS
geowave vector localexport [options] <store name>
DESCRIPTION
This command exports vector data from a GeoWave data store to an Avro file.
OPTIONS
--typeNames <types>
Comma separated list of types to export.
--batchSize <size>
Records to process at a time. Default is 10,000.
--cqlFilter <filter>
Filter exported data based on CQL filter.
--indexName <index>
The name of the index to export from.
* --outputFile <file>
The file to export data to.
EXAMPLES
Export all data from the hail type in the example data store to an Avro file:
geowave vector localexport --typeNames hail --outputFile out.avro example
MapReduce Export
NAME
geowave-vector-mrexport - Export vector data from a data store to Avro using MapReduce
SYNOPSIS
geowave vector mrexport [options] <path to base directory to write to> <store name>
DESCRIPTION
This command will perform a data export for vector data in a data store, and will use MapReduce to
support high-volume data stores.
OPTIONS
--typeNames <types>
Comma separated list of types to export.
--batchSize <size>
Records to process at a time. Default is 10,000.
--cqlFilter <filter>
Filter exported data based on CQL filter.
--indexName <index>
The name of the index to export from.
--maxSplits <count>
The maximum partitions for the input data.
--minSplits <count>
The minimum partitions for the input data.
--resourceManagerHostPort <host>
The host and port of the resource manager.
EXAMPLES
Export all data from the hail type in the example data store to an Avro file using MapReduce:
geowave vector mrexport --typeNames hail --resourceManagerHostPort localhost:8032 /exportexample
Raster CommandsCommands that operate on raster data.
Resize with MapReduce
NAME
geowave-raster-resizemr - Resize Raster Tiles using MapReduce
SYNOPSIS
geowave raster resizemr [options] <input store name> <output store name>
DESCRIPTION
This command will resize raster tiles that are stored in a GeoWave data store using MapReduce, and
write the resized tiles to a new output store.
OPTIONS
* --hdfsHostPort <host>
The HDFS host and port.
--indexName <index>
The index that the input raster is stored in.
* --inputCoverageName
The name of the input raster coverage.
* --jobSubmissionHostPort <host>
The job submission tracker host and port.
--maxSplits <count>
The maximum partitions for the input data.
--minSplits <count>
The minimum partitions for the input data.
* --outputCoverageName <name>
The output raster coverage name.
* --outputTileSize <size>
The tile size to output.
EXAMPLES
Resize the cov raster in the example data store to 256 and name the resulting raster cov_resized:
geowave raster resizemr --hdfsHostPort localhost:53000 --jobSubmissionHostPortlocalhost:8032 --inputCoverageName cov --outputCoverageName cov_resized --outputTileSize256 example example
Resize with Spark
NAME
geowave-raster-resizespark - Resize Raster Tiles using Spark
SYNOPSIS
geowave raster resizespark [options] <input store name> <output store name>
DESCRIPTION
This command will resize raster tiles that are stored in a GeoWave data store using Spark, and write
the resized tiles to a new output store.
OPTIONS
-ho, --host <host>
The spark driver host. Default is localhost.
--indexName <index>
The index that the input raster is stored in.
* --inputCoverageName <name>
The name of the input raster coverage.
-m, --master <designation>
The spark master designation. Default is yarn.
--maxSplits <count>
The maximum partitions for the input data.
--minSplits <count>
The minimum partitions for the input data.
-n, --name <name>
The Spark application name. Default is RasterResizeRunner.
* --outputCoverageName
The output raster coverage name.
* --outputTileSize
The tile size to output.
EXAMPLES
Resize the cov raster in the example data store to 256 and name the resulting raster cov_resized:
geowave raster resizespark -m local --inputCoverageName cov --outputCoverageNamecov_resized --outputTileSize 256 example example
Install GDAL
NAME
geowave-raster-installgdal - Install GDAL by downloading native libraries
SYNOPSIS
geowave raster installgdal [options]
DESCRIPTION
This command will download GDAL to a directory. The directory should be added to the PATH and
Linux users should set LD_LIBRARY_PATH to the directory. This command is not currently supported on
Mac.
OPTIONS
--dir
The download directory.
GeoServer CommandsCommands that manage GeoServer stores and layers.
Run GeoServer
NAME
geowave-gs-run - Runs a standalone GeoServer instance
SYNOPSIS
geowave gs run [options]
DESCRIPTION
This command runs a standalone GeoServer instance.
OPTIONS
-d, --directory <path>
The directory to use for GeoServer. Default is ./lib/services/third-party/embedded-
geoserver/geoserver.
-i, --interactive
If specified, prompt for user input to end the process.
-p, --port <port>
Select the port for GeoServer to listen on. Default is 8080.
EXAMPLES
Run a standalone GeoServer instance:
geowave gs run
Store Commands
Add Store
NAME
geowave-gs-ds-add - Add a data store to GeoServer
SYNOPSIS
geowave gs ds add [options] <data store name>geowave geoserver datastore add [options] <data store name>
DESCRIPTION
This command adds a GeoWave data store to GeoServer as a GeoWave store.
OPTIONS
-ds, --datastore <name>
The name of the new GeoWave store to add to GeoServer.
-ws, --workspace <workspace>
The GeoServer workspace to use for the store.
EXAMPLES
Add a GeoWave data store example as a GeoWave store in GeoServer called my_store:
geowave gs ds add -ds my_store example
Get Store
NAME
geowave-gs-ds-get - Get GeoServer store info
SYNOPSIS
geowave gs ds get [options] <store name>geowave geoserver datastore get [options] <store name>
DESCRIPTION
This command returns information about a store within the configured GeoServer instance.
OPTIONS
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
Get information about the my_store store from GeoServer:
geowave gs ds get my_store
Get Store Adapters
NAME
geowave-gs-ds-getsa - Get type info from a GeoWave store
SYNOPSIS
geowave gs ds getsa <store name>geowave geoserver datastore getstoreadapters <store name>
DESCRIPTION
This command returns information about all the GeoWave types in a store from the configured
GeoServer instance.
EXAMPLES
Get information about all the GeoWave types in the my_store store on GeoServer:
geowave gs ds getsa my_store
List Stores
NAME
geowave-gs-ds-list - List GeoServer stores
SYNOPSIS
geowave gs ds list [options]geowave geoserver datastore list [options]
DESCRIPTION
This command lists stores from the configured GeoServer instance.
OPTIONS
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
List all stores in GeoServer:
geowave gs ds list
Remove Store
NAME
geowave-gs-ds-rm - Remove GeoServer store
SYNOPSIS
geowave gs ds rm [options] <store name>geowave geoserver datastore rm [options] <store name>
DESCRIPTION
This command removes a store from the configured GeoServer instance.
OPTIONS
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
Remove the my_store store from GeoServer:
geowave gs ds rm my_store
Coverage Store Commands
Add Coverage Store
NAME
geowave-gs-cs-add - Add a coverage store to GeoServer
SYNOPSIS
geowave gs cs add [options] <store name>geowave geoserver coveragestore add [options] <store name>
DESCRIPTION
This command adds a coverage store to the configured GeoServer instance. It requires that a GeoWave
store has already been added.
OPTIONS
-cs, --coverageStore <name>
The name of the coverage store to add.
-histo, --equalizeHistogramOverride
This parameter will override the behavior to always perform histogram equalization if a histogram
exists.
-interp, --interpolationOverride <value>
This will override the default interpolation stored for each layer. Valid values are 0, 1, 2, 3 for
NearestNeighbor, Bilinear, Bicubic, and Bicubic (polynomial variant) respectively.
-scale, --scaleTo8Bit
By default, integer values will automatically be scaled to 8-bit and floating point values will not.
This can be overridden setting this option.
-ws, --workspace <workspace>
The GeoServer workspace to add the coverage store to.
EXAMPLES
Add a coverage store called cov_store to GeoServer using the my_store GeoWave store:
geowave gs cs add -cs cov_store my_store
Get Coverage Store
NAME
geowave-gs-cs-get - Get GeoServer coverage store info
SYNOPSIS
geowave gs cs get [options] <coverage store name>geowave geoserver coveragestore get [options] <coverage store name>
DESCRIPTION
This command will return information about a coverage store from the configured GeoServer instance.
OPTIONS
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
Get information about the coverage store called my_store from GeoServer:
geowave gs cs get my_store
List Coverage Stores
NAME
geowave-gs-cs-list - List GeoServer coverage stores
SYNOPSIS
geowave gs cs list [options]geowave geoserver coveragestore list [options]
DESCRIPTION
This command lists all coverage stores in the configured GeoServer instance.
OPTIONS
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
List all coverage stores in GeoServer:
geowave gs cs list
Remove Coverage Store
NAME
geowave-gs-cs-rm - Remove GeoServer Coverage Store
SYNOPSIS
geowave gs cs rm [options] <coverage store name>geowave geoserver coveragestore rm [options] <coverage store name>
DESCRIPTION
This command removes a coverage store from the configured GeoServer instance.
OPTIONS
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
Remove the cov_store coverage store from GeoServer:
geowave gs cs rm cov_store
Coverage Commands
Add Coverage
NAME
geowave-gs-cv-add - Add a coverage to GeoServer
SYNOPSIS
geowave gs cv add [options] <coverage name>geowave geoserver coverage add [options] <coverage name>
DESCRIPTION
This command adds a coverage to the configured GeoServer instance.
OPTIONS
* -cs, --cvgstore <name>
Coverage store name.
-ws, --workspace <workspace>
GeoServer workspace to add the coverage to.
EXAMPLES
Add a coverage called cov to the cov_store coverage store on the configured GeoServer instance:
geowave gs cv add -cs cov_store cov
Get Coverage
NAME
geowave-gs-cv-get - Get a GeoServer coverage’s info
SYNOPSIS
geowave gs cv get [options] <coverage name>geowave geoserver coverage get [options] <coverage name>
DESCRIPTION
This command returns a information about a coverage from the configured GeoServer instance.
OPTIONS
-cs, --coverageStore <name>
The name of the GeoServer coverage store.
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
Get information about the cov coverage in the cov_store coverage store:
geowave gs cv get -cs cov_store cov
List Coverages
NAME
geowave-gs-cv-list - List GeoServer coverages
SYNOPSIS
geowave gs cv list [options] <coverage store name>geowave geoserver coverage list [options] <coverage store name>
DESCRIPTION
This command lists all coverages from a given coverage store in the configured GeoServer instance.
OPTIONS
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
List all coverages in the cov_store coverage store on GeoServer:
geowave gs cv list cov_store
Remove Coverage
NAME
geowave-gs-cv-rm - Remove a GeoServer coverage
SYNOPSIS
geowave gs cv rm [options] <coverage name>geowave geoserver coverage rm [options] <coverage name>
DESCRIPTION
This command removes a coverage from the configured GeoServer instance.
OPTIONS
* -cs, --cvgstore <name>
The coverage store that contains the coverage.
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
Remove the cov coverage from the cov_store coverage store in GeoServer:
geowave gs cv rm -cs cov_store cov
Layer Commands
Add GeoWave Layer
NAME
geowave-gs-layer-add - Add a GeoServer layer from the given GeoWave data store
SYNOPSIS
geowave gs layer add [options] <data store name>geowave geoserver layer add [options] <data store name>
DESCRIPTION
This command adds a layer from the given GeoWave data store to the configured GeoServer instance.
Unlike gs fl add, this command adds a layer directly from a GeoWave data store, automatically
creating the GeoWave store for it in GeoServer.
OPTIONS
-t, --typeName <type>
Add the type with the given name to GeoServer.
-a, --add <layer type>
Add all layers of the given type to GeoServer. Possible values are ALL, RASTER, and VECTOR.
-sld, --setStyle <style>
The default style to use for the added layers.
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
Add a type called hail from the example data store to GeoServer:
geowave gs layer add -t hail example
Add all types from the example data store to GeoServer:
geowave gs layer add --add ALL example
Add all vector types from the example data store to GeoServer:
geowave gs layer add --add VECTOR example
Add Feature Layer
NAME
geowave-gs-fl-add - Add a feature layer to GeoServer
SYNOPSIS
geowave gs fl add [options] <layer name>geowave geoserver featurelayer add [options] <layer name>
DESCRIPTION
This command adds a feature layer from a GeoWave store to the configured GeoServer instance.
OPTIONS
* -ds, --datastore <name>
The GeoWave store (on GeoServer) to add the layer from.
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
Add a layer called hail from the my_store GeoWave store:
geowave gs fl add -ds my_store hail
Get Feature Layer
NAME
geowave-gs-fl-get - Get GeoServer feature layer info
SYNOPSIS
geowave gs fl get <layer name>geowave geoserver featurelayer get <layer name>
DESCRIPTION
This command returns information about a layer in the configured GeoServer instance.
EXAMPLES
Get information about the layer hail from GeoServer:
geowave gs fl get hail
List Feature Layers
NAME
geowave-gs-fl-list - List GeoServer feature layers
SYNOPSIS
geowave gs fl list [options]geowave geoserver featurelayer list [options]
DESCRIPTION
This command lists feature layers from the configured GeoServer instance.
OPTIONS
-ds, --datastore <name>
The GeoServer store name to list feature layers from.
-g, --geowaveOnly
If specified, only layers from GeoWave stores will be listed.
-ws, --workspace <workspace>
The GeoServer workspace to use.
EXAMPLES
List all feature layers in GeoServer:
geowave gs fl list
List all GeoWave feature layers in GeoServer:
geowave gs fl list -g
List all feature layers from the my_store store in GeoServer:
geowave gs fl list -ds my_store
Remove Feature Layer
NAME
geowave-gs-fl-rm - Remove GeoServer feature Layer
SYNOPSIS
geowave gs fl rm <layer name>geowave geoserver featurelayer rm <layer name>
DESCRIPTION
This command removes a feature layer from the configured GeoServer instance.
EXAMPLES
Remove the hail layer from GeoServer:
geowave gs fl rm hail
Style Commands
Add Style
NAME
geowave-gs-style-add - Add a style to GeoServer
SYNOPSIS
geowave gs style add [options] <style name>geowave geoserver style add [options] <style name>
DESCRIPTION
This command adds an SLD style file to the configured GeoServer instance.
OPTIONS
* -sld, --stylesld <file>
The SLD to add to GeoServer.
EXAMPLES
Add the my_sld.sld style file to GeoServer with the name my_style:
geowave gs style add -sld my_sld.sld my_style
Get Style
NAME
geowave-gs-style-get - Get GeoServer style info
SYNOPSIS
geowave gs style get <style name>geowave geoserver style get <style name>
DESCRIPTION
This command returns information about a style from the configured GeoServer instance.
EXAMPLES
Get information about the my_style style on GeoServer:
geowave gs style get my_style
List Styles
NAME
geowave-gs-style-list - List GeoServer styles
SYNOPSIS
geowave gs style listgeowave geoserver style list
DESCRIPTION
This command lists all styles in the configured GeoServer instance.
EXAMPLES
List all styles in GeoServer:
geowave gs style list
Remove Style
NAME
geowave-gs-style-rm - Remove GeoServer Style
SYNOPSIS
geowave gs style rm <style name>geowave geoserver style rm <style name>
DESCRIPTION
This command removes a style from the configured GeoServer instance.
EXAMPLES
Remove the my_style style from GeoServer:
geowave gs style rm my_style
Set Layer Style
NAME
geowave-gs-style-set - Set GeoServer layer style
SYNOPSIS
geowave gs style set [options] <layer name>geowave geoserver style set [options] <layer name>
DESCRIPTION
This command sets the layer style to the specified style in the configured GeoServer instance.
OPTIONS
* -sn, --styleName <name>
The name of the style to set on the layer.
EXAMPLES
Set the style on the hail layer to my_style:
geowave gs style set -sn my_style hail
Workspace Commands
Add Workspace
NAME
geowave-gs-ws-add - Add a workspace to GeoServer
SYNOPSIS
geowave gs ws add <workspace name>geowave geoserver workspace add <workspace name>
DESCRIPTION
This command adds a new workspace to the configured GeoServer instance.
EXAMPLES
Add a new workspace to GeoServer called geowave:
geowave gs ws add geowave
List Workspaces
NAME
geowave-gs-ws-list - List GeoServer workspaces
SYNOPSIS
geowave gs ws listgeowave geoserver workspace list
DESCRIPTION
This command lists all workspaces in the configured GeoServer instance.
EXAMPLES
List all workspaces in GeoServer:
geowave gs ws list
Remove Workspace
NAME
geowave-gs-ws-rm - Remove GeoServer workspace
SYNOPSIS
geowave gs ws rm <workspace name>geowave geoserver workspace rm <workspace name>
DESCRIPTION
This command removes a workspace from the configured GeoServer instance.
EXAMPLES
Remove the geowave workspace from GeoServer:
geowave gs ws rm geowave
Utility CommandsMiscellaneous operations that don’t really warrant their own top-level command. This includes
commands to start standalone data stores and services.
Standalone Store Commands
Commands that stand up standalone stores for testing and debug purposes.
Run Standalone Accumulo
NAME
geowave-util-accumulo-run - Runs a standalone mini Accumulo server for test and debug with
GeoWave
SYNOPSIS
geowave util accumulo run
DESCRIPTION
This command runs a standalone mini single-node Accumulo server, which can be used locally for
testing and debugging GeoWave, without needing to stand up an entire cluster.
EXAMPLES
Run a standalone Accumulo cluster:
geowave util accumulo run
Run Standalone Bigtable
NAME
geowave-util-bigtable-run - Runs a standalone Bigtable instance for test and debug with GeoWave
SYNOPSIS
geowave util bigtable run [options]
DESCRIPTION
This command runs a standalone Bigtable instance, which can be used locally for testing and
debugging GeoWave, without needing to set up a full instance.
OPTIONS
-d, --directory <path>
The directory to use for Bigtable. Default is ./target/temp.
-i, --interactive <enabled>
Whether to prompt for user input to end the process. Default is true.
-p, --port <host>
The host and port the emulator will run on. Default is 127.0.0.1:8086.
-s, --sdk <sdk>
The name of the Bigtable SDK. Default is google-cloud-sdk-183.0.0-linux-x86_64.tar.gz.
-u, --url <url>
The URL location to download Bigtable. Default is
https://dl.google.com/dl/cloudsdk/channels/rapid/downloads.
EXAMPLES
Run a standalone Bigtable instance:
geowave util bigtable run -d .
Run Standalone Cassandra
NAME
geowave-util-cassandra-run - Runs a standalone Cassandra instance for test and debug with GeoWave
SYNOPSIS
geowave util cassandra run [options]
DESCRIPTION
This command runs a standalone Cassandra instance, which can be used locally for testing and
debugging GeoWave, without needing to set up a full instance.
OPTIONS
-c, --clusterSize <size>
The number of individual Cassandra processes to run. Default is 1.
-d, --directory <path>
The directory to use for Cassandra.
-i, --interactive <enabled>
Whether to prompt for user input to end the process. Default is true.
-m, --maxMemoryMB <size>
The maximum memory to use in MB. Default is 512.
EXAMPLES
Run a standalone Cassandra instance:
geowave util cassandra run -d .
Run Standalone DynamoDB
NAME
geowave-util-dynamodb-run - Runs a standalone DynamoDB instance for test and debug with
GeoWave
SYNOPSIS
geowave util dynamodb run [options]
DESCRIPTION
This command runs a standalone DynamoDB instance, which can be used locally for testing and
debugging GeoWave, without needing to set up a full instance.
OPTIONS
-d, --directory <path>
The directory to use for DynamoDB.
-i, --interactive <enabled>
Whether to prompt for user input to end the process. Default is true.
EXAMPLES
Run a standalone DynamoDB instance:
geowave util dynamodb run -d .
Run Standalone HBase
NAME
geowave-util-hbase-run - Runs a standalone HBase instance for test and debug with GeoWave
SYNOPSIS
geowave util hbase run [options]
DESCRIPTION
This command runs a standalone HBase instance, which can be used locally for testing and debugging
GeoWave, without needing to set up a full instance.
OPTIONS
-a, --auth <authorizations>
A list of authorizations to grant the admin user.
-d, --dataDir <path>
Directory for HBase server-side data. Default is ./lib/services/third-party/embedded-hbase/data.
-i, --interactive
If specified, prompt for user input to end the process.
-l, --libDir <path>
Directory for HBase server-side libraries. Default is ./lib/services/third-party/embedded-hbase/lib.
-r, --regionServers <count>
The number of region server processes. Default is 1.
-z, --zkDataDir <path>
The data directory for the Zookeper instance. Default is ./lib/services/third-party/embedded-
hbase/zookeeper.
EXAMPLES
Run a standalone HBase instance:
geowave util hbase run
Run Standalone Kudu
NAME
geowave-util-kudu-run - Runs a standalone Kudu instance for test and debug with GeoWave
SYNOPSIS
geowave util kudu run [options]
DESCRIPTION
This command runs a standalone Kudu instance, which can be used locally for testing and debugging
GeoWave, without needing to set up a full instance.
OPTIONS
-d, --directory <path>
The directory to use for Kudu. Default is ./target/temp.
-i, --interactive <enabled>
Whether to prompt for user input to end the process. Default is true.
-t, --tablets <count>
The number of tablets to use for Kudu. Default is 0.
EXAMPLES
Run a standalone Kudu instance:
geowave util kudu run -d . -t 2
Run Standalone Redis
NAME
geowave-util-redis-run - Runs a standalone Redis instance for test and debug with GeoWave
SYNOPSIS
geowave util redis run [options]
DESCRIPTION
This command runs a standalone Redis instance, which can be used locally for testing and debugging
GeoWave, without needing to set up a full instance.
OPTIONS
-d, --directory <path>
The directory to use for Redis. If set, the data will be persisted and durable. If none, it will use a
temp directory and delete when complete
-i, --interactive <enabled>
Whether to prompt for user input to end the process. Default is true.
-m, --maxMemory <size>
The maximum memory to use (in a form such as 512M or 1G). Default is 1G.
-p, --port <port>
The port for Redis to listen on. Default is 6379.
-s, --setting <setting>
A setting to apply to Redis in the form of <name>=<value>.
EXAMPLES
Run a standalone Redis instance:
geowave util redis run'''
Accumulo Commands
Utility operations to set Accumulo splits and run a test server.
Run Standalone
NAME
geowave-util-accumulo-run - Runs a standalone mini Accumulo server for test and debug with
GeoWave
SYNOPSIS
geowave util accumulo run
DESCRIPTION
This command runs a standalone mini single-node Accumulo server, which can be used locally for
testing and debugging GeoWave, without needing to stand up an entire cluster.
EXAMPLES
Run a standalone Accumulo cluster:
geowave util accumulo run
Pre-split Partition IDs
NAME
geowave-util-accumulo-presplitpartitionid - Pre-split Accumulo table by providing the number of
partition IDs
SYNOPSIS
geowave util accumulo presplitpartitionid [options] <store name>
DESCRIPTION
This command pre-splits an Accumulo table by providing the number of partition IDs.
OPTIONS
--indexName <name>
The GeoWave index. Default is all indices.
--num <count>
The number of partitions.
EXAMPLES
Pre-split the spatial_idx table to 8 partitions in the example data store:
geowave util accumulo presplitpartitionid --indexName spatial_idx --num 8 example
Split Equal Interval
NAME
geowave-util-accumulo-splitequalinterval - Set Accumulo splits by providing the number of partitions
based on an equal interval strategy
SYNOPSIS
geowave util accumulo splitequalinterval [options] <store name>
DESCRIPTION
This command will allow a user to set the accumulated splits through providing the number of
partitions based on an equal interval strategy.
OPTIONS
--indexName <name>
The GeoWave index. Default is all indices.
--num <count>
The number of partitions.
EXAMPLES
Split the spatial_idx table to 8 partitions using an equal interval strategy in the example data store:
geowave util accumulo splitequalinterval --indexName spatial_idx --num 8 example
Split by Number of Records
NAME
geowave-util-accumulo-splitnumrecords - Set Accumulo splits by providing the number of entries per
split
SYNOPSIS
geowave util accumulo splitnumrecords [options] <storename>
DESCRIPTION
This command sets the Accumulo data store splits by providing the number of entries per split.
OPTIONS
--indexName <name>
The GeoWave index. Default is all indices.
--num <count>
The number of entries.
EXAMPLES
Set the number of entries per split to 1000 on the spatial_idx index of the example data store:
geowave util accumulo splitnumrecords --indexName spatial_idx --num 1000 example
Split Quantile Distribution
NAME
geowave-util-accumulo-splitquantile - Set Accumulo splits by providing the number of partitions based
on a quantile distribution strategy
SYNOPSIS
geowave util accumulo splitquantile [options] <store name>
DESCRIPTION
This command allows a user to set the Accumulo data store splits by providing the number of
partitions based on a quantile distribution strategy.
OPTIONS
--indexName <name>
The GeoWave index. Default is all indices.
--num <count>
The number of partitions.
EXAMPLES
Split the spatial_idx table to 8 partitions using a quantile distribution strategy in the example data
store:
geowave util accumulo splitquantile --indexName spatial_idx --num 8 example
OSM Commands
Operations to ingest Open Street Map (OSM) nodes, ways and relations to GeoWave.
IMPORTANT OSM commands are not included in GeoWave by default.
Import OSM
NAME
geowave-util-osm-ingest - Ingest and convert OSM data from HDFS to GeoWave
SYNOPSIS
geowave util osm ingest [options] <hdfs host:port> <path to base directory to read from><store name>
DESCRIPTION
This command will ingest and convert OSM data from HDFS to GeoWave.
OPTIONS
-jn, --jobName
Name of mapreduce job. Default is Ingest (mcarrier).
-m, --mappingFile
Mapping file, imposm3 form.
--table
OSM Table name in GeoWave. Default is OSM.
* -t, --type
Mapper type - one of node, way, or relation.
-v, --visibility
The visibility of the data ingested (optional; default is 'public').
Stage OSM
NAME
geowave-util-osm-stage - Stage OSM data to HDFS
SYNOPSIS
geowave util osm stage [options] <file or directory> <hdfs host:port> <path to basedirectory to write to>
DESCRIPTION
This command will stage OSM data from a local directory and write it to HDFS.
OPTIONS
--extension
PBF File extension. Default is .pbf.
Python Commands
Commands for use with the GeoWave Python bindings.
Run Py4J Java Gateway
NAME
geowave-util-python-rungateway - Run a Py4J java gateway
SYNOPSIS
geowave util python rungateway
DESCRIPTION
This command starts the Py4J java gateway required by pygw.
EXAMPLES
Run the Py4J java gateway:
geowave util python rungateway
Landsat8 Commands
Operations to analyze, download, and ingest Landsat 8 imagery publicly available on AWS.
Analyze Landsat 8
NAME
geowave-util-landsat-analyze - Print out basic aggregate statistics for available Landsat 8 imagery
SYNOPSIS
geowave util landsat analyze [options]
DESCRIPTION
This command prints out basic aggregate statistics that are for available Landsat 8 imagery.
OPTIONS
--cql <filter>
An optional CQL expression to filter the ingested imagery. The feature type for the expression has
the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double),
processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally
attributes of the individuals band can be used such as band (String), sizeMB (double), and
bandDownloadUrl (String).
--nbestbands <count>
An option to identify and only use a set number of bands with the best cloud cover.
--nbestperspatial
A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by
path/row.
--nbestscenes <count>
An option to identify and only use a set number of scenes with the best cloud cover.
--sincelastrun
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last
scene.
--usecachedscenes
If specified, run against the existing scenes catalog in the workspace directory if it exists.
-ws, --workspaceDir <path>
A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.
EXAMPLES
Analyze the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin,
Germany:
geowave util landsat analyze --nbestperspatial --nbestscenes 1 --cql"BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" -ws./landsat
Download Landsat 8
NAME
geowave-util-landsat-download - Download Landsat 8 imagery to a local directory
SYNOPSIS
geowave util landsat download [options]
DESCRIPTION
This command downloads Landsat 8 imagery to a local directory.
OPTIONS
--cql <filter>
An optional CQL expression to filter the ingested imagery. The feature type for the expression has
the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double),
processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally
attributes of the individuals band can be used such as band (String), sizeMB (double), and
bandDownloadUrl (String).
--nbestbands <count>
An option to identify and only use a set number of bands with the best cloud cover.
--nbestperspatial
A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by
path/row.
--nbestscenes <count>
An option to identify and only use a set number of scenes with the best cloud cover.
--sincelastrun
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last
scene.
--usecachedscenes
If specified, run against the existing scenes catalog in the workspace directory if it exists.
-ws, --workspaceDir <path>
A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.
EXAMPLES
Download the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin,
Germany:
geowave util landsat download --nbestperspatial --nbestscenes 1 --cql"BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" -ws./landsat
Ingest Landsat 8
NAME
geowave-util-landsat-ingest - Ingest Landsat 8 imagery and metadata into a GeoWave data store
SYNOPSIS
geowave util landsat ingest [options] <store name> <comma delimited index list>
DESCRIPTION
This command downloads Landsat 8 imagery and then ingests it as raster data into GeoWave. At the
same time, it ingests the scene metadata as vector data. The raster and vector data can be ingested into
two separate data stores, if desired.
OPTIONS
--converter <converter>
Prior to ingesting an image, this converter will be used to massage the data. The default is not to
convert the data.
--coverage <name>
The name to give to each unique coverage. Freemarker templating can be used for variable
substition based on the same attributes used for filtering. The default coverage name is
${entityId}_${band}. If ${band} is unused in the coverage name, all bands will be merged together
into the same coverage.
--cql <filter>
An optional CQL expression to filter the ingested imagery. The feature type for the expression has
the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double),
processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally
attributes of the individuals band can be used such as band (String), sizeMB (double), and
bandDownloadUrl (String).
--crop
If specified, use the spatial constraint provided in CQL to crop the image. If no spatial constraint is
provided, this will not have an effect.
--histogram
If specified, store the histogram of the values of the coverage so that histogram equalization will be
performed.
--nbestbands <count>
An option to identify and only use a set number of bands with the best cloud cover.
--nbestperspatial
A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by
path/row.
--nbestscenes <count>
An option to identify and only use a set number of scenes with the best cloud cover.
--overwrite
If specified, overwrite images that are ingested in the local workspace directory. By default it will
keep an existing image rather than downloading it again.
--pyramid
If specified, store an image pyramid for the coverage.
--retainimages
If specified, keep the images that are ingested in the local workspace directory. By default it will
delete the local file after it is ingested successfully.
--sincelastrun
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last
scene.
--skipMerge
By default the ingest will automerge overlapping tiles as a post-processing optimization step for
efficient retrieval, but this option will skip the merge process.
--subsample <factor>
Subsample the image prior to ingest by the scale factor provided. The scale factor should be an
integer value greater than or equal to 1. Default is 1.
--tilesize <size>
The pixel size for each tile stored in GeoWave. Default is 256.
--usecachedscenes
If specified, run against the existing scenes catalog in the workspace directory if it exists.
--vectorindex <index>
By ingesting as both vectors and rasters you may want each indexed differently. This will override
the index used for vector output.
--vectorstore <store name>
By ingesting as both vectors and rasters you may want to ingest vector data into a different data
store. This will override the data store for vector output.
-ws, --workspaceDir <path>
A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.
EXAMPLES
Ingest and crop the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin,
Germany, and output raster data to a landsatraster data store and vector data to a landsatvector data
store:
geowave util landsat ingest --nbestperspatial --nbestscenes 1 --usecachedscenes --cql"BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" --crop--retainimages -ws ./landsat --vectorstore landsatvector --pyramid --coverageberlin_mosaic landsatraster spatial-idx
Ingest Landsat 8 Raster
NAME
geowave-util-landsat-ingestraster - Ingest Landsat 8 imagery into a GeoWave data store
SYNOPSIS
geowave util landsat ingestraster [options] <store name> <comma delimited index list>
DESCRIPTION
This command downloads Landsat 8 imagery and then ingests it as raster data into GeoWave.
OPTIONS
--converter <converter>
Prior to ingesting an image, this converter will be used to massage the data. The default is not to
convert the data.
--coverage <name>
The name to give to each unique coverage. Freemarker templating can be used for variable
substition based on the same attributes used for filtering. The default coverage name is
${entityId}_${band}. If ${band} is unused in the coverage name, all bands will be merged together
into the same coverage.
--cql <filter>
An optional CQL expression to filter the ingested imagery. The feature type for the expression has
the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double),
processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally
attributes of the individuals band can be used such as band (String), sizeMB (double), and
bandDownloadUrl (String).
--crop
If specified, use the spatial constraint provided in CQL to crop the image. If no spatial constraint is
provided, this will not have an effect.
--histogram
If specified, store the histogram of the values of the coverage so that histogram equalization will be
performed.
--nbestbands <count>
An option to identify and only use a set number of bands with the best cloud cover.
--nbestperspatial
A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by
path/row.
--nbestscenes <count>
An option to identify and only use a set number of scenes with the best cloud cover.
--overwrite
If specified, overwrite images that are ingested in the local workspace directory. By default it will
keep an existing image rather than downloading it again.
--pyramid
If specified, store an image pyramid for the coverage.
--retainimages
If specified, keep the images that are ingested in the local workspace directory. By default it will
delete the local file after it is ingested successfully.
--sincelastrun
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last
scene.
--skipMerge
By default the ingest will automerge overlapping tiles as a post-processing optimization step for
efficient retrieval, but this option will skip the merge process.
--subsample <factor>
Subsample the image prior to ingest by the scale factor provided. The scale factor should be an
integer value greater than or equal to 1. Default is 1.
--tilesize <size>
The pixel size for each tile stored in GeoWave. Default is 512.
--usecachedscenes
If specified, run against the existing scenes catalog in the workspace directory if it exists.
-ws, --workspaceDir <path>
A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.
EXAMPLES
Ingest and crop the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin,
Germany, and output raster data to a landsatraster data store:
geowave util landsat ingestraster --nbestperspatial --nbestscenes 1 --usecachedscenes--cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" --crop--retainimages -ws ./landsat --pyramid --coverage berlin_mosaic landsatraster spatial-idx
Ingest Landsat 8 Metadata
NAME
geowave-util-landsat-ingestvector - Ingest Landsat 8 scene and band metadata into a data store
SYNOPSIS
geowave util landsat ingestvector [options] <store name> <comma delimited index list>
DESCRIPTION
This command ingests Landsat 8 scene and band metadata into a GeoWave data store.
OPTIONS
--cql <filter>
An optional CQL expression to filter the ingested imagery. The feature type for the expression has
the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double),
processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally
attributes of the individuals band can be used such as band (String), sizeMB (double), and
bandDownloadUrl (String).
--nbestbands <count>
An option to identify and only use a set number of bands with the best cloud cover.
--nbestperspatial
A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by
path/row.
--nbestscenes <count>
An option to identify and only use a set number of scenes with the best cloud cover.
--sincelastrun
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last
scene.
--usecachedscenes
If specified, run against the existing scenes catalog in the workspace directory if it exists.
-ws, --workspaceDir <path>
A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.
EXAMPLES
Ingest scene and band metadata of the B8 band of Landsat raster data over a bounding box that
roughly surrounds Berlin, Germany to a landsatvector data store:
geowave util landsat ingestvector --nbestperspatial --nbestscenes 1 --usecachedscenes--cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" -ws./landsat landsatvector spatial-idx
gRPC Commands
Commands for working with the gRPC service.
Start gRPC Server
NAME
geowave-util-grpc-start - Start the GeoWave gRPC server
SYNOPSIS
geowave util grpc start [options]
DESCRIPTION
This command starts the GeoWave gRPC server on a given port number. Remote gRPC clients can
interact with GeoWave from this service.
OPTIONS
-p, --port <port>
The port number the server should run on. Default is 8980.
-n, --nonBlocking
If specified, runs the server in non-blocking mode.
EXAMPLE
Run a gRPC server on port 8980:
geowave util grpc start -p 8980
Stop gRPC Server
NAME
geowave-util-grpc-stop - Stop the GeoWave gRPC server
SYNOPSIS