semantic geospatial data integration and mining for national security ashraful alam bhavani...

Click here to load reader

Upload: valentine-willis

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

  • Semantic Geospatial Data Integration and Miningfor National SecurityAshraful AlamBhavani ThuraisinghamGanesh SubbiahLatifur KhanUniversity of Texas at DallasShashi Shekhar University of Minnesota

  • Geospatial Data Integration: Motivating ScenarioQuery: Find movie theaters within 30 miles of 75080 within, near, overlap Geospatial OperatorsTheaters, Restaurants Businesses (Non-Geospatial data)Miles Distance Unit75080 , Richardson Geo References

    Cinemark Movies 10 Radisson Hotel Dallas North-Richardson

  • Key ContributionsQuery can be handled byTraditional search engine Google, Yahoo Not at Semantic levelAlmost no search engine facilitates finding relevant web services (except Woogle template matching for web services & no composition) and handle complex queriesDAGIS Discover geospatial semantic web services using OWL-S Service ontology coupled with geospatial domain specific ontology for automatic discovery, dynamic composition and invocationFacilitates semantic matching of functional and non-functional services from various heterogeneous independent data sources.

  • Key ContributionsDAGIS DAGIS DAGISQuery Matchmaker ComposerAgentClient BrowserWeb Service Provider AWeb Service Provider B Advertise As Semantic ServicesAutomatic Semantic Query Generation by DAGIS Query AgentSemantic Matching using Matchmaker for Functional and QoS ParametersDynamic on the Fly Composition for Service orchestration using DAGIS Composer

    Semantic Query generation Web Service Provider Z

  • OWL-S Upper Ontology Mapping to WSDL communication protocol (RPC, HTTP, ) marshalling/serialization transformation to and from XSD to OWL Control flow of the serviceBlack/Grey/Glass Box view Protocol Specification Abstract MessagesCapability specificationGeneral features of the Service Quality of Service Classification in Service taxonomies

  • DAGIS System Architecture

  • Generation of Semantic enabled profile for Geospatial QueryDomain Ontology (Snapshot)Generated OWL-S Semantic Profile http://www.utdallas.edu/~gxs059000/Query.owlhttp://www.utdallas.edu/~gxs059000/OGCServiceontology.owlin OWL

  • Geospatial Service Selection and DiscoveryDAGIS Agent OWL-S MX Matchmaker1 Best Service Match : Functionality,QoS

    Degrees of Match:EXACT < PLUG-IN < SUBSUMES< SUBSUMED-BY

  • Geospatial Service Invocation

    -OWL-S grounding-WSDL Grounding-Service Invocation through AXIS

    GetTheater Atomic Process

  • DAGIS System FlowDAGIS Query InterfaceOWL-S MatchMakerOWL-DL Reasoner for Matchmaker1Service Providers1 Pellet is an open source, OWL DL reasoner: http://pellet.owldl.com/

  • DAGIS for Complex Queries

    1. Query Profile

    2. Service Discovery

    3. Compose Selection

    4. Construct Sequence

    .Return Dynamic Service URI DAGISComposer

    Match-Maker

    DAGISAgent

    Client

    ComposerSequencer

    Find Movie Theaters within 30 Miles from Richardson, TX

    6. Service Invocation

  • DAGIS Composer AlgorithmRecursive Back Chaining Inference Mechanism (Regression Planning) TX Richardson 30 Miles

    Movie TheatersZipcodefinder

    GetTheaterInputs:= City, State , DistanceOutput := Movie Theaters NO Service Provider Inputs:= City, State Output := ZipCode ZipCodeFinder Inputs:= ZipCode , DistanceOutput := MovieTheatersTheater Finder

  • DAGIS Query Interface

  • DAGIS Integration Scenarios

  • Geospatial Data Mining: Case Study: DatasetASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) To obtain detailed maps of land surface temperature, reflectivity and elevation. ASTER obtains high-resolution (15 to 90 square meters per pixel) images of the Earth in 14 different wavelengths of the electromagnetic spectrum, ranging from visible to thermal infrared light. ASTER data is used to create detailed maps of land surface temperature, emissivity, reflectivity, and elevation.

  • Case Study: Dataset & FeaturesRemote sensing data used in this study is ASTER image acquired on 31 December 2005. Covers northern part of Dallas with Dallas-Fort Worth International Airport located in southwest of the image. ASTER data has 14 channels from visible through the thermal infrared regions of the electromagnetic spectrum, providing detailed information on surface temperature, emissive, reflectance, and elevation. ASTER is comprised of the following three radiometers :Visible and Near Infrared Radiometer (VNIR --band 1 through band 3) has a wavelength range from 0.56~0.86m.

  • Case Study: Dataset & FeaturesShort Wavelength Infrared Radiometer (SWIR-- band 4 through band 9) has a wavelength range from 1.60~2.43m. Mid-infrared regions. Used to extract surface features.Thermal Infrared Radiometer (TIR --band 10 through band 14) covers from 8.125~11.65m. Important when research focuses on heat such as identifying mineral resources and observing atmospheric condition by taking advantage of their thermal infrared characteristics.

  • ASTER Dataset: Technical Challenges Testing will be done based on pixelsGoal: Region-based classification and identify high level conceptsSolutionGrouping adjacent pixels that belong to same classIdentify high level concepts using ontology-based mining

  • Sketches: Process of Our Approach

  • SVM Classifiers: Atomic ConceptsDifferent Class Distribution of Training and Test Sets

  • Process of Our Approach

  • Ontology-Driven Mining Ontology will be represented as a directed acyclic graph (DAG). Each node in DAG represents a conceptInterrelationships are represented by labeled arcs/links. Various kinds of interrelationships are used to create an ontology such as specialization (Is-a), instantiation (Instance-of), and component membership (Part-of).

    ResidentialApartmentSingle Family HomeMulti-family HomeIS-AUrbanPart-of

  • Ontology-Driven Mining We have developed domain-dependent ontologies Provide for specification of fine grained concepts Concept, Residential Area can be further categorized into concepts, House, Grass and Tree etc.Generic ontologies provide concepts in coarser grain

  • Ontology Driven Mining

  • ChallengesRegion growingFind out regions of the same classFind out neighboring regionsMerge neighboring regionsNot scalableIrregular regionsOf different sizesHard to track boundaries or neighboring regionsPixel mergingOnly neighboring pixels consideredPixels are converted into ConceptsLinear

  • Pixels Merging

  • ComplexityThere are two iterations:First iteration converts signature classes into ConceptsSecond iteration converts remaining classes and isolated concepts into Dominating classesEach pixels take O(1) timeTarget area takes O(n) time, where n is the number of pixels in the target areaExample (next slide):Signature classes: c1, c2, c3Non-signature class: c4Concepts: C1, C2, C3

  • ImplementationSoftware: ArcGIS 9.1 software. For programming, we use Visual Basic 6.0 embedded in the software.

  • Output:

  • DirectionsResearch in Geospatial data integration, mining and security us funded by Raytheon CorporationStared a joint project with Prof. Shashi Shekhar, University of Minnesota on Spatio-Temporal Data Mining for Crime Analysis funded by NGA (National Geospatial Intelligence Agency)

  • Spatio-Temporal Pattern Mining for Multi-Jurisdiction Multi-Timeframe (MJMT) Activity Datasets

    Investigators: Shashi Shekhar,(U Minnesota) Bhavani T., L. Khan(U.T.Dallas) Funding Agency: NGA

    Motivation: Many Applications Example: Urban Crime patterns, Sensor Data, Pattern Families: Hotspots, Journey to crime, trends, Tasks: Crime Prevention, Patrol routes/schedule, Problem Definition Inputs: (i) Activity reports with location and timePattern families Output: Pattern instances Objective Function: Accuracy, ScalabilityConstraints: Urban transportation network

  • Challenge 1: Spatio-Temporal (ST) Nature of PatternsState of the Art: Environmental CriminologySpatial Methods: Hotspots, Spatial RegressionSpace-time interaction (Knox test)

    Critical Barriers: richer ST semantics Ex. Trends, periodicity, displacement

    Approach:Categorize pattern familiesQuantify: interest measuresDesign scalable algorithmsEvaluate with crime datasets

    Challenges: Trade-off b/w Semantic richness and Scalable algorithmsHigh activity: 2300 -0000 hrsRings = weekdays; Slices = hour(Source: US Army ERDC, TEC)

  • 2: Activites on Urban Infrastructure ST NetworksState of the Art: Environmental CriminologyLargely geometric Methods Few Network Methods: Journey to Crime (J2C)

    Critical Barriers: Scale: Houston 100,000 crimes / yearNetwork based explanationSpatio-temporal networks

    Approaches:Scalable algorithms for J2C analysisNetwork based explanatory modelsTime-aggregated graphs (TAG)

    Challenges: Key assumptions violated!Ex. Prefix optimality of shortest pathsCant use Dijktras, A*, etc.(b) Output: Journey- to-Crime(thickness = route popularity)Source: Crimestat(a) Input: Pink lines connect crime location & criminals residence

  • 3: Multi-Jurisdiction Multi-Temporal (MJMT) DataState of the Art:Spatial, ST ontologiesFew network ontologies

    Critical Barriers: Heterogeneity across networksUncertainty map accuracy, gps,

    Approach:Ontologies: ST Network activitiesIntegration methods: MJMT dataLocation accuracy models

    Challenges:Test datasetsEvaluation methods

    Geospatial data integration has become one of the most important characteristics of current information systems.. The ability to provide additional dimensions to otherwise monotonic information has led to an enormous increase in the use of geospatial services. For instance, Medical data overlapped with digital maps provides wealth of information in forecasting epidemics; population research centers can trace genealogical data over a region to discover social trends and so forth. Our goal is to enable an integrated geospatial Web where data search and retrieval would be as smooth as browsing web pages. To achieve the goal, we focus on aspects that differ from most current technologies.One, we utilize a single interface to search, retrieve and update the data. Two, implement a secure, end-to-end semantics architecture.

    The Motivation for our research is to provide seamless integration of Geospatial and Non-Geospatial information with minimal human intervention. Find movie theaters within 30 miles of 75080 is one such relevant query for this task. This query has Non Geospatial concept Theater that requires geospatial meaning in this query, geospatial operator within, geospatial reference 75080 and geographic distance in miles, which arises semantic heterogeneities during integration if the geospatial context is not established effectively. OWL-S is an OWL ontology to describe Web servicesOWL-S leverages on OWL toSupport capability based discovery of Web servicesSupport automatic composition of Web ServicesSupport automatic invocation of Web services-----------------------------------------------------------------------------------------------Service ProfilePresented by a service.Representswhat the service providesTwo main uses:Advertisements of Web Services capabilitiesRequest of Web services with a given set of capabilities

    What level of service quality does the Web service provide? Security parameters---------------------------------------------------------------------------------

    Process ModelDescribes how a service works: internal processes of the serviceSpecifies service interaction protocolSpecifies abstract messages: ontological type of information transmittedFacilitatesWeb service invocationComposition of Web servicesMonitoring of interaction

    The query submitted is disambiguated using our Geospatial Services domain specific ontology . This Ontology is continuously being developed and refined using bottom-up approach. The next step after providing definite meanings to the query is the automatic generation of semantic query profile using the OWL-S Service Profile Ontology. This generated query profile will be used by the DAGIS Agent for Service Discovery and Selection of the Service Providers.The DAGIS Agent communicates with the Matchmaker agent for geospatial service selection. Prior to this Service Discovery, the agents of each Service Provider advertise the respective OWL-S Service profile to the Matchmaker. The service selection is based on the functional and non-functional requirements of the generated query profile. The Matchmaker framework is based on the OWLSMX Hybrid Matchmaker that performs both Logic based and Syntactic Matching.In this third phase, the DAGIS Agent which has the selected Service Providers OWL-S URI from the discovery process invokes the Service Provider. The Agent does the invocation of the service through OWL-S grounding. The OWL-S grounding then uses WSDL grounding to invoke the Web Service using AXIS in our framework. This is the DAGIS System Architecture comprising of the query interface, matchmaker , OWL-DL Reasoner (Pellet) , registered Service ProvidersReal world scenarios inextricably involve more complex queries that necessitate dynamic composition of different service providers. This demands marshalling of geospatial data through multiple services on the fly. The query shown is of this nature.There is no single service provider for this query selected by the matchmaker, DAGISComposer is invoked to select relevant services for composition .The Compose Sequencer then constructs a composite OWL-S Service for the selected services. In this case this semantically composed service comprises of the Zip Code Web Service and the Find Theater Web Service.This illustrates the working of the DAGIS Composer Algorithm. For The previous query , The inputs are the City , State and Distance. And the desired output is Movie Thetaers. There is no service provider registered with matchmaker for this query type. The algorithm is based on the backward search or the regression planning. The principal question in regression planning is what are the states from which applying a given action leads to the goal? This computation of these states is called Regressing the goal through the action.

    The Main Advantage of this algorithm it considers only relevant actions.This is the Dagis Interface and the Results for the previous query is shown. Commercial Search Engines lack a single interface which handle queries involving both geospatial and non-geospatial concepts.