engineering data analysis & modeling practical solutions to practical problems
DESCRIPTION
Engineering Data Analysis & Modeling Practical Solutions to Practical Problems. Dr. James McNames Biomedical Signal Processing Laboratory Electrical & Computer Engineering Portland State University. Course Overview. Key question: How to extract useful information from data? Some theory - PowerPoint PPT PresentationTRANSCRIPT
Engineering Data Analysis & ModelingPractical Solutions to Practical Problems
Dr. James McNamesBiomedical Signal Processing Laboratory
Electrical & Computer Engineering
Portland State University
Course Overview
• Key question: How to extract useful information from data?
• Some theory• Mostly methods & applications• Problem oriented, not technology focused• Project course
Talk Overview
• Problem definitions
• Applications
• Project ideas
• Course specifics
Problem Definitions
• Preprocessing (briefly)– Variable selection
– Dimension reduction
• Decision theory (hypothesis testing)• Density estimation• Nonlinear optimization• Pattern recognition/Classification (very briefly)• Nonlinear modeling (univariate & multivariate)
Variable Selection
P(t+1)
NonlinearModel
Inputs
P(t)Previous Price
C(t)Competitor's Price
G(t)Greenspan's BP
Output
• Many algorithms fail if too many inputs• Often fewer inputs are sufficient due to
– Redundant inputs– Irrelevant inputs
• Goal: Find a subset of inputs that maximizes model accuracy
• Is Greenspan’s BP relevant?
Dimension Reduction
• Redundant inputs can also be combined into a smaller composite set– Improves accuracy– Reduces computation
• If done well, minimal information is lost• Used for signal compression• Principal component analysis is most common
yNonlinear
Model
Raw Inputs
x
Output
DimensionReduction
u
Features
Dimension Reduction Example 1
Dimension Reduction Example 2
Nonlinear Optimization
• Find the vector a such that E(a) is minimized• Many algorithms have parameters that must be
“fit” to the data• Usually “fit” by minimizing error measure• Sometimes subject to a constraint G(a) = 0• Unconstrained optimization more common• Very widely used• Many engineering applications
Pattern Recognition
• Closely related to nonlinear modeling
• Goal is to identify most likely category given an input vector
• Equivalent to drawing decision boundaries
• Following example– Crab data– Four categories– Two composite inputs
Crabs Data Set
Biomedical Application
• Goal: identify brain cell types from microrecordings
• Current research project
• 5 categories of cell types
• Created metrics to characterize signals
• Following scatterplot shows 2 of these metrics
Neurosurgery Example
Nonlinear Modeling
• Given many examples of observed variables, create a model that can predict the output
• No other assumed knowledge• Observed variables
– Quantitative– Measurable
z1,...,zn
x1,...,xn y
ProcessObservedVariables
UnobservedVariables
Output
xn ,...,xn
ObservedVariables
c
dc+1z
Nonlinear Modeling
• Observed variables may not be causal• Not all causal effects are observed• Model will not be perfect• How do you measure how good the model is?
z1,...,zn
x1,...,xn y
ProcessObservedVariables
UnobservedVariables
Output
xn ,...,xn
ObservedVariables
c
dc+1z
x1,...,xn y
ModelObservedVariables Output
d
Smoothing
• For single-input single-output (SISO) systems, can plot the data
• Problem is to estimate a curve that most accurately predicts future points
• Could draw a smooth curve by hand
• More difficult to implement automatically
• More than one curve may be reasonable
Smoothing Example
Multiple “Reasonable” Solutions
Nonlinear Modeling
• Many methods do not work well
• Usually is much more difficult– Noise– Multiple inputs– Time-varying system– Small data sets
• Still an active area of research
• Will discuss "tried and true” solutions
Overview of Course
• Introduction & review
• Linear models
• Univariate smoothing
• Optimization algorithms
• Nonlinear modeling
• Pattern recognition & classification
Application Areas
• Engineering– Controls (system identification)– Signal processing (estimation & prediction)– Communications (channel equalization)
• Statistics
• Mathematics
• Computer science
• Systems science
Application Examples
• Time series prediction– Aircraft carrier landing systems
• Spatial Wafer Patterns
• Fault Detection
• Machinery health monitoring
• Automated, objective credit rating
• Fraud detection
Time Series Prediction
Spatial Wafer Patterns
Wafer Components
Estimation (Regression) Results
Fault Detection in Semiconductor Manufacturing
Aircraft Carrier Landing System
• Can be very hard– Limited visibility– Rough seas– Night
• Predict location at touch down– Flight deck– Aircraft
• Is rocking of flight deck predictable?
Machinery Health
Monitoring
• Cost of machinery failure can be very high• Recent growth in real-time monitoring
– Health and Usage Monitoring Systems (HUMS)– Condition Based Maintenance (CBM)
• Reduce costs• Increase safety
Fraud Detection
• Credit card fraud cost $864 million in 1992
• How quickly can fraud be detected?
• The companies have amassed large data bases
• What are the patterns of fraud?
• Active area of research
Past Projects
• Many past projects – See reports & slides on the web
• Many time series applications– Need not be time series related
• Many have resulted in conference and journal publications
• Expect improved quality this term
Project Ideas
• It is up to you to identify a project
• Preferred– Data readily available (no new instrumentation
or study design)– Independent samples (not time series data)– Engineering related– High likelihood of success (no financial
forecasting)
Course Logistics
• Project oriented– Project reports– Must meet IEEE journal requirements– May be encouraged to publish– Brief oral slide presentation at end of term
• Most projects are applied
• May also create new methods or compare existing methods
Prerequisites
• Helpful– Random processes (ECE 565)– Signal processing (ECE 566)– Proficient at MATLAB or similar
• Required– Calculus – Probability & statistics (STAT 451)– Linear algebra (MTH 343)– Proficiency at programming