data and applications security developments and directions

34
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Data Warehousing, Data Mining and Security October 8, 2010

Upload: neona

Post on 22-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Data and Applications Security Developments and Directions. Dr. Bhavani Thuraisingham The University of Texas at Dallas Data Warehousing, Data Mining and Security October 8, 2010. Outline. Background on Data Warehousing Security Issues for Data Warehousing Data Mining and Security. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data and Applications Security  Developments and Directions

Data and Applications Security Developments and Directions

Dr. Bhavani Thuraisingham

The University of Texas at Dallas

Data Warehousing, Data Mining and Security

October 8, 2010

Page 2: Data and Applications Security  Developments and Directions

Outline

Background on Data Warehousing Security Issues for Data Warehousing Data Mining and Security

Page 3: Data and Applications Security  Developments and Directions

What is a Data Warehouse?

A Data Warehouse is a:

- Subject-oriented

- Integrated

- Nonvolatile

- Time variant

- Collection of data in support of management’s decisions

- From: Building the Data Warehouse by W. H. Inmon, John Wiley and Sons

Integration of heterogeneous data sources into a repository Summary reports, aggregate functions, etc.

Page 4: Data and Applications Security  Developments and Directions

Example Data Warehouse

OracleDBMS forEmployees

SybaseDBMS forProjects

InformixDBMS forMedical

Data Warehouse:Data correlatingEmployees WithMedical Benefitsand Projects

Could beany DBMS; Usually based on the relational data model

UsersQuerythe Warehouse

Page 5: Data and Applications Security  Developments and Directions

Some Data Warehousing Technologies

Heterogeneous Database Integration Statistical Databases Data Modeling Metadata Access Methods and Indexing Language Interface Database Administration Parallel Database Management

Page 6: Data and Applications Security  Developments and Directions

Data Warehouse Design

Appropriate Data Model is key to designing the Warehouse Higher Level Model in stages

- Stage 1: Corporate data model

- Stage 2: Enterprise data model

- Stage 3: Warehouse data model Middle-level data model

- A model for possibly for each subject area in the higher level model

Physical data model

- Include features such as keys in the middle-level model Need to determine appropriate levels of granularity of data in order

to build a good data warehouse

Page 7: Data and Applications Security  Developments and Directions

Distributing the Data Warehouse

Issues similar to distributed database systems

Distributed Warehouse

Central Bank

Branch A Branch B

CentralWarehouse

CentralBank

Branch A Branch B

CentralWarehouse

Branch BWarehouse

Branch AWarehouse

Non-distributed Warehouse

Page 8: Data and Applications Security  Developments and Directions

Multidimensional Data Model

Project Name

Project Leader

Project Sponsor

Project Cost

Project Duration

Dollars

Pounds

Yen

Years

Months

Weeks

Project Name

Project Leader

Project Sponsor

Project Cost

Project Duration

Dollars

Pounds

Yen

Years

Months

Weeks

Page 9: Data and Applications Security  Developments and Directions

Indexing for Data Warehousing

Bit-Maps Multi-level indexing Storing parts or all of the index files in main memory Dynamic indexing

Page 10: Data and Applications Security  Developments and Directions

Metadata Mappings

Metadatafor Data source A

Metadatafor Data source B

Metadatafor Data source C

Metadata for Mappings and Transformations

Metadata for Mappings and Transformations

Metadata for Mappings and Transformations

Metadatafor the Warehouse

Metadatafor Data source A

Metadatafor Data source B

Metadatafor Data source C

Metadata for Mappings and Transformations

Metadata for Mappings and Transformations

Metadata for Mappings and Transformations

Metadatafor the Warehouse

Page 11: Data and Applications Security  Developments and Directions

Data Warehousing and Security

Security for integrating the heterogeneous data sources into the repository

- e.g., Heterogeneity Database System Security, Statistical Database Security

Security for maintaining the warehouse

- Query, Updates, Auditing, Administration, Metadata Multilevel Security

- Multilevel Data Models, Trusted Components

Page 12: Data and Applications Security  Developments and Directions

Example Secure Data Warehouse

Secure Data Warehouse Manager

Secure DBMS A Secure DBMS B Secure DBMS C

SecureDatabase

SecureDatabase

SecureDatabase

User

Secure Warehouse

Page 13: Data and Applications Security  Developments and Directions

Secure Data Warehouse Technologies

Secure Data Warehousing Technologies:

Secure data modelingSecure heterogeneous database integrationDatabase securitySecure access methods and indexingSecure query languagesSecure database administrationSecure high performance computing technologiesSecure metadata management

Page 14: Data and Applications Security  Developments and Directions

Security for Integrating Heterogeneous Data Sources

Integrating multiple security policies into a single policy for the warehouse

- Apply techniques for federated database security?

- Need to transform the access control rules Security impact on schema integration and metadata

- Maintaining transformations and mappings Statistical database security

- Inference and aggregation

- e.g., Average salary in the warehouse could be unclassified while the individual salaries in the databases could be classified

Administration and auditing

Page 15: Data and Applications Security  Developments and Directions

Security Policy for the Warehouse

Federated policies become warehouse policies?

Component Policy for Component A

Component Policy for Component B

Component Policy for Component C

Generic Policy for Component A

Generic Policy for Component B

Generic policy for Component C

Export Policy for Component A

Export Policy for Component B

Export Policy for Component C

Federated Policy for Federation

F1

Federated Policy for Federation

F2

Export Policy for Component B

Security Policy Integration and Transformation

Page 16: Data and Applications Security  Developments and Directions

Security Policy for the Warehouse - II

Policyfor the Warehouse

PolicyFor Data Source A

PolicyFor Data Source B

PolicyFor Data Source C

Policy forMappings andTransformations

Policy forMappings andTransformations

Policy forMappings andTransformations

Page 17: Data and Applications Security  Developments and Directions

Secure Data Warehouse Model

Dollars, S

Pounds, S

Yen, S

Year, U

Months, U

Weeks, U

Project Name, U

Project Leader, U

Project Sponsor, S

Project Cost, S

Project Duration, U

U = UnclassifiedS = Secret

Page 18: Data and Applications Security  Developments and Directions

Methodology for Developing a Secure Data Warehouse

IntegrateSecuredatasources

Clean/modifydataSources.IntegratepoliciesSecure data

sources

Build securedata model,schemas,accessmethods,and indexstrategies forthe securewarehouse

Page 19: Data and Applications Security  Developments and Directions

Multi-Tier Architecture

Tier 1:Secure Data Sources

Tier 2: Builds on Tier 1

Tier N: Data WarehouseBuilds on Tier N-1

**

Tier 1:Secure Data Sources

Tier 2: Builds on Tier 1

Tier N: Secure Data WarehouseBuilds on Tier N-1

**

Each layer builds on the Previous LayerSchemas/Metadata/Policies

Page 20: Data and Applications Security  Developments and Directions

Administration

Roles of Database Administrators, Warehouse Administrators, Database System Security officers, and Warehouse System Security Officers?

When databases are updated, can trigger mechanism be used to automatically update the warehouse?

- i.e., Will the individual database administrators permit such mechanism?

Page 21: Data and Applications Security  Developments and Directions

Auditing

Should the Warehouse be audited?

- Advantages Keep up-to-date information on access to the

warehouse

- Disadvantages May need to keep unnecessary data in the warehouse May need a lower level granularity of data May cause changes to the timing of data entry to the

warehouse as well as backup and recovery restrictions

Need to determine the relationships between auditing the warehouse and auditing the databases

Page 22: Data and Applications Security  Developments and Directions

Multilevel Security

Multilevel data models

- Extensions to the data warehouse model to support classification levels

Trusted Components

- How much of the warehouse should be trusted?

- Should the transformations be trusted? Covert channels, inference problem

Page 23: Data and Applications Security  Developments and Directions

Inference Controller

UserUser

Secure DBMS A Secure DBMS B Secure DBMS C

SecureDatabase

SecureDatabase

SecureDatabase

Secure WarehouseSecure Data Warehouse

Manager

InferenceController

Page 24: Data and Applications Security  Developments and Directions

Status and Directions

Commercial data warehouse vendors are incorporating role-based security (e.g., Oracle)

Many topics need further investigation

- Building a secure data warehouse

- Policy integration

- Secure data model

- Inference control

Page 25: Data and Applications Security  Developments and Directions

Data Mining for Counter-terrorism

Data Mining forNon real-time Threats:Gather data, build terrorist profilesMine data, prune results

Data Mining forCounter-terrorism

Data Mining forReal-time Threats:Gather data in real-time, build real-time models,Mine data, Report results

Page 26: Data and Applications Security  Developments and Directions

Data Mining Needs for Counterterrorism: Non-real-time Data Mining

Gather data from multiple sources

- Information on terrorist attacks: who, what, where, when, how

- Personal and business data: place of birth, ethnic origin, religion, education, work history, finances, criminal record, relatives, friends and associates, travel history, . . .

- Unstructured data: newspaper articles, video clips, speeches, emails, phone records, . . .

Integrate the data, build warehouses and federations Develop profiles of terrorists, activities/threats Mine the data to extract patterns of potential terrorists and predict

future activities and targets Find the “needle in the haystack” - suspicious needles? Data integrity is important Techniques have to SCALE

Page 27: Data and Applications Security  Developments and Directions

Data Mining for Non Real-time Threats

Integratedatasources

Clean/modifydatasources

BuildProfilesof Terrorists and Activities

Examineresults/

Pruneresults

Reportfinalresults

Data sourceswith informationabout terroristsand terrorist activities

Minethedata

Page 28: Data and Applications Security  Developments and Directions

Data Mining Needs for Counterterrorism: Real-time Data Mining

Nature of data

- Data arriving from sensors and other devices Continuous data streams

- Breaking news, video releases, satellite images

- Some critical data may also reside in caches Rapidly sift through the data and discard unwanted data for later use

and analysis (non-real-time data mining) Data mining techniques need to meet timing constraints Quality of service (QoS) tradeoffs among timeliness, precision and

accuracy Presentation of results, visualization, real-time alerts and triggers

Page 29: Data and Applications Security  Developments and Directions

Data Mining for Real-time Threats

Integratedatasources in real-time

Buildreal-timemodels

ExamineResults in Real-time

Reportfinalresults

Data sourceswith informationabout terroristsand terrorist activities

Minethedata

Rapidlysift throughdata and discardirrelevant data

Page 30: Data and Applications Security  Developments and Directions

Data Mining Outcomes and Techniques for Counter-terrorism

Association:John and Jamesoften seen together after anattack

Link Analysis:Follow chain from A to B to C to D

Clustering: Divide population; People from country X of a certain religion; people from Country Y Interested in airplanes

Classification:Build profiles ofTerrorist and classify terrorists

Anomaly Detection:John registers at flight school;but des not care about takeoff or landing

Data Mining Outcomes and Techniques

Page 31: Data and Applications Security  Developments and Directions

Example Success Story - COPLINK COPLINK developed at University of Arizona

- Research transferred to an operational system currently in use by Law Enforcement Agencies

What does COPLINK do?

- Provides integrated system for law enforcement; integrating law enforcement databases

- If a crime occurs in one state, this information is linked to similar cases in other states

- It has been stated that the sniper shooting case may have been solved earlier if COPLINK had been operational at that time

Page 32: Data and Applications Security  Developments and Directions

Where are we now? We have some tools for

- building data warehouses from structured data

- integrating structured heterogeneous databases

- mining structured data

- forming some links and associations

- information retrieval tools

- image processing and analysis

- pattern recognition

- video information processing

- visualizing data

- managing metadata

Page 33: Data and Applications Security  Developments and Directions

What are our challenges? Do the tools scale for large heterogeneous databases and petabyte

sized databases? Building models in real-time; need training data Extracting metadata from unstructured data Mining unstructured data Extracting useful patterns from knowledge-directed data mining Rapidly forming links and associations; get the big picture for real-

time data mining Detecting/preventing cyber attacks Mining the web Evaluating data mining algorithms Conducting risks analysis / economic impact Building testbeds

Page 34: Data and Applications Security  Developments and Directions

IN SUMMARY:

Data Mining is very useful to solve Security Problems

- Data mining tools could be used to examine audit data and flag abnormal behavior

- Much recent work in Intrusion detection (unit #18) e.g., Neural networks to detect abnormal patterns

- Tools are being examined to determine abnormal patterns for national security

Classification techniques, Link analysis

- Fraud detection Credit cards, calling cards, identity theft etc.

BUT CONCERNS FOR PRIVACY