[ppt]anomaly detection for scientific data - nasa … · web viewanomaly detection for scientific...

Anomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop February 17, 2005

Upload: ngodang

Post on 29-Jul-2018

216 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Anomaly Detection for Scientific Data

Mark SchwabacherNASA ARC, Code TI (formerly IC, TC)

ROSES Code S & T WorkshopFebruary 17, 2005

Page 2: [PPT]Anomaly Detection for Scientific Data - NASA … · Web viewAnomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop

What is Anomaly Detection?

• Seek to find parts of the data (“anomalies”) that are different from the rest of the data

• “Supervised” approaches use examples of anomalies; “unsupervised” approaches do not.

Page 3: [PPT]Anomaly Detection for Scientific Data - NASA … · Web viewAnomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop

How can Anomaly Detectionbe Applied to Scientific Data?

• Examples:– Data from Earth-observing satellites– Data from telescopes

• Direct scientists’ attention to anomalies – could lead to scientific discoveries

• Detect errors, so they can be corrected

Page 4: [PPT]Anomaly Detection for Scientific Data - NASA … · Web viewAnomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop

Example Earth Science Application:Vegetation Data

• Joint work with Ranga Myneni of Boston University• Used Leaf Area Index (LAI) & Fraction Absorbed of

Photosynthetically Available Radiation (FPAR) from Moderate Resolution Imaging Spectroradiometer (MODIS) instrument aboard the Terra and Aqua satellites

Page 5: [PPT]Anomaly Detection for Scientific Data - NASA … · Web viewAnomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop

Results

• Used MODIS data from one time point at 4 km resolution (7.7 million pixels within Earth’s land area)

• Used 4 variables: LAI, FPAR, QA, and latitude• Used an unsupervised, distance-based anomaly detection algorithm• The #1 outlier was in northern Russia and the #2 outlier was in

southern New Zealand• Both points had unusually high LAI and FPAR values for their

latitudes• Investigation revealed a bug in the software that produced the LAI and

FPAR products• Error was corrected, and new versions of the data were made available

to the scientific community.

Page 6: [PPT]Anomaly Detection for Scientific Data - NASA … · Web viewAnomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop

Algorithm Used: Orca(Distance-Based Outliers)

The main idea is to find points in low density regions of the feature space

NVkxP )(

x d

• V is the total volume within radius d• N is the total number of examples• k is the number of examples in sphere

Joint work with Stephen Bay of ISLE

Page 7: [PPT]Anomaly Detection for Scientific Data - NASA … · Web viewAnomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop

Orca Algorithm

• Based on nested loops– For each example, find it’s nearest neighbors with a

sequential scan• Modified with a pruning rule

– While performing the sequential scan,• Keep track of closest neighbors found so far• prune examples once the neighbors found so far indicate that the

example cannot be a top outlier

• Worst case O(N2) distance computations• In practice, runs in nearly linear time• Can handle millions of data points

Page 8: [PPT]Anomaly Detection for Scientific Data - NASA … · Web viewAnomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop

Conclusions

• Anomaly detection algorithms can find previously-unknown anomalies in large scientific data sets

• Could lead to scientific discoveries or correction of errors• Different algorithms find qualitatively different anomalies,

so it is worth running multiple algorithms• I presented one algorithm (Orca) that runs in nearly linear

time so it can be applied to very large data sets

Page 9: [PPT]Anomaly Detection for Scientific Data - NASA … · Web viewAnomaly Detection for Scientific Data Mark Schwabacher NASA ARC, Code TI (formerly IC, TC) ROSES Code S & T Workshop

Pruning

Outliers based on distance to the 3rd nearest neighbor (k=3)

x d

39 State-gov 77516 Bachelors 1350 Self-emp-not-inc83311 Bachelors 1338 Private 215646 HS-grad 953 Private 234721 11th 728 Private 338409 Bachelors 1337 Private 284582 Masters 1449 Private 160187 9th 552 Self-emp-not-inc209642 HS-grad 931 Private 45781 Masters 1442 Private 159449 Bachelors 1337 Private 280464 Some-college 1030 State-gov 141297 Bachelors 1323 Private 122272 Bachelors 1332 Private 205019 Assoc-acdm 1240 Private 121772 Assoc-voc 1134 Private 245487 7th-8th 425 Self-emp-not-inc176756 HS-grad 932 Private 186824 HS-grad 938 Private 28887 11th 743 Self-emp-not-inc292175 Masters 1440 Private 193524 Doctorate 1654 Private 302146 HS-grad 935 Federal-gov 76845 9th 543 Private 117037 11th 759 Private 109015 HS-grad 956 Local-gov 216851 Bachelors 1319 Private 168294 HS-grad 954 ? 180211 Some-college 1039 Private 367260 HS-grad 9

sequential scan

d is distance to 3rd nearest neighbor for the weakest top outlier

SIGNAL-PROCESSING ALGORITHM Final Report: Gust Detection prepared for NASA Dryden ... · 2013-08-30 · Final Report: Gust Detection prepared for NASA Dryden Flight Research Center

1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion

Welcome to NASA Applied Remote Sensing Training … · NASA Applied Remote Sensing Training Program ... Change Detection: Visual Analysis 1991 band 4 (NIR) ... – Image subtraction

Airborne, direct-detection, 2-µm triplepulse IPDA ... - NASA

Aging Aircraft Wiring Fault Detection Survey - NASA · 6/1/2007 · Aging Aircraft Wiring Fault Detection Survey . Prepared for the . Aviation Safety Program Aircraft Aging & Durability

Advanced Detection, Isolation, and Accommodation of · PDF file-NASA per 2925 _ - " February 1990:. ,. 2" • r Advanced Detection, Isolation, and Accommodation of Sensor Failures

Gear Tooth Wear Detection Algorithm - NASA · PDF fileGear Tooth Wear Detection Algorithm NASA/TM—2015-218830 ... AGMA 1010 Standards, Appearance of Gear Teeth: Terminology of Wear

Report of the NASA International Near-Earth-Object...January 25, 1992 The Spaceguard Survey: Report of the NASA International Near-Earth-Object Detection Workshop edited by David Morrison

A Portable Infrasonic Detection System - NASA

NASA Technical Memorandum...NASA Technical Memorandum 100421 The Design and Use of a Temperature-Compensated Hot-Film Anemometer System for Boundary-Layer Flow Transition Detection