geospatial categories in information retrieval from ...ies-webarchive-ext.jrc.it/ies/uploads/sdi/vg...
TRANSCRIPT
Geospatial Categories in Information Retrieval from
Virtual Globes:
Cultural and Linguistic Variation in Geographic Feature Categorization
and Delimitation
David M. MarkUniversity at Buffalo, USA
• This project is supported by awards from the US National Science Foundation (BCS-0423075), from the National Geospatial-Intelligence Agency, and from Microsoft Research Virtual Earth.
• This work is part of “The Ethnophysiography Project”: The authors wish to thank Andrew Turk, Barry Smith, Werner Kuhn, David Stea, Tom Bittner, Peter Fisher, Carmelita Topaha, Carolyn O’Meara, Boyan Brodaric, and many others for valuable discussions and insights related to the project
The Problem:• It seems like that most people
searching the web or trying to retrieve information from the Internet will search by feature names or feature types
• Yet most of the information is in the form of images or fields
• Feature extraction procedures are needed
• Feature categories show cultural variation, and feature delimitation may also vary by culture
Here are some English-language descriptions of some landscape feature types that are referred to by a single word in some other language, but have no single word in English:
• “An area from which you cannot see the sea”• “A landmass containing an area from which you
cannot see the sea”• “An area of agricultural land reclaimed from a
water body or wetland”• “An ‘island’ of land completely surrounded by
one or more younger lava flows”• “An island of grassland left unburnt after a
surrounding wildfire” …
Example: Hawai’ian has a word “Kipuka”
• Kipuka: A Hawai’ian word for an ‘island’ of land completely surrounded by one or more younger lava flows
• Icelandic has a word for a lava ‘island’ too!– Óbrinnishólmi– Literally, “un-burnt-hill”
• And the Walmajarri (in Australia) language has a word with a similar meaning:– Nyirirr: an island of grass left unburnt
by surrounding fire
• So, if we based our ideas about geographic categories only on the meanings of the words in English, we would miss out on a lot!
• Yet current Geographic Information Systems and databases are based mostly just on English and other dominant languages!
Place Name
Geographic Category
The Gazetteer Triangle:“Core Elements of Digital Gazetteers:
Placenames, Categories, and Footprints”
Footprint
Example 1: Standing Water, English-French, Naïve view
Lac Étang Lagune
Lake X
Pond X
Lagoon X
French and English Categorize Standing Water Bodies Differently
English
French
Conceptual Model for Water Bodies
• Kinds of bodies of water may be distinguished along several dimensions:– Size– Flowing or still– Salt or fresh water– Other aspects of water quality– Origin– Seasonality of water– Seasonality of flow– ...
• Different languages or cultures may give different weights to these factors
Representing GeoCategories for Multilingual Use: What Should We Do?
• Store attributes of generic water entities, rather than categories?
• Represent this as a lattice rather than as a taxonomic tree?
• Does this fit into Werner Kuhn’s idea of “Semantic Reference Systems”?
Standing Water, English-French, More complete view
Lac Étang Lagune
Lake X X -
Pond ? X -
Lagoon ? X X
Example 2: Eminences
• Eminence (Oxford English Dictionary):
– I. In physical senses.– 1. a. Height, altitude, degree of elevation
(obs.)– b. A lofty or elevated position– 2. a. A prominence, protuberance. Chiefly in Anat. b. Bot.
(See quot. 1688.) Obs.
– 3. An elevation on the earth's surface; a rising ground, hill
Hills and Mountains
• Mountain:1. a. A large natural elevation of the earth's surface, esp. one high and steep in form (larger and higher than a hill) (OED)
Hills and Mountains• Hill:
1. a. A natural elevation of the earth's surface … after the introduction of the word mountain’ [into English], gradually restricted to heights of less elevation; … (OED)
Hills and Mountains: Not Only Size!
• Hill: “a more rounded and less rugged outline is also usually connoted by the name” (OED)
Hill Mountain??
#76
• 4• B• Shiprock area• buttes/ monoliths• P9250035
“Finger Rock”
“Mitten Buttes”
“Shiprock”
“Picacho Peak”
For features too small to be ‘mountains’, yet too jagged to be ‘hills’, English relies on other terms, such as rock, butte, peak, mesa, etc.
Feature Extraction and Classification
• Determine and implement a suite of measures of the 3-dimensional shapes of eminences
• Look for clusters of features in a parameter space • Try to distinguish eminence types named and used in
English, based on these measures• Apply the method to DEMs from the Navajo
reservation• See whether eminence types named and used in the
Navajo can also be defined from there parameters of detected in the parameter space
• Extend to other regions and other languages• Eventually, develop methods for other higher-level
landform types such as canyons and valleys
A ‘Middle-ware’ of fundamental landform characteristics
Concept 2
“mont”
Real world
“montaña”“mountain” “berg” “ ”
Concept 1 Concept 3
Concept 4
Concept 5
What is needed?
Geospatial information
• The landform concept modules (with weights and parameters) could be combined to provide formal definitions of the concepts associated with each landscape term
Concept 2
“mont”“montaña”“mountain” “berg” “ ”
Concept 1 Concept 3
Concept 4
Concept 5
A ‘Middle-ware’ of fundamental landform characteristics
Real worldGeospatial information
• The same landform concept modules can also be ‘executed’ against the DEM to detect and delimit regions that have the attributes embodied in the modules, and thus detect and delimit landform instances
A ‘Middle-ware’ of fundamental landform characteristics
Extraction of Eminences• Summit-driven eminence detection defines the core
of the eminence• ‘Uphill’ catchments provide one approach to delineate
the lower boundaries of eminences; these are exactly hills as defined in 1870 by James Clerk Maxwell (‘On Hills and Dales’)
• Slope, curvature, and breaks of slope might also be used to locate the outer boundaries of eminences
Classification
• Once eminences are extracted, they can be classified based on properties such as: – i) Morphology—shape, size, position,
orientation, …– ii) Materials—water, stone, sand, clay, ...– iii) Spatial relationships—proximity,
prominence, topology, ...
The Challenge• EU has 23 official language• This does not including Catalan, Romansch,
Bretagne, Euskara, Sami, Frisian, …• World-wide, there are about 5,000 languages
that still have 1,000 or more speakers• There may be about 100 ‘geographical’ terms
per language• This means about 500,000 terms to be
defined and implemented!
Thanks!For more information, contact [email protected] or seehttp://www.ncgia.buffalo.edu/ethnophysiography/