time series 1

Upload: quanduyminh

Post on 03-Mar-2016

214 views

Category:

Documents


0 download

DESCRIPTION

Time Series Data Analysis

TRANSCRIPT

  • Time Series Data Analysis - IYaji Sripada

  • In this lecture you learnWhat are Time Series?How to analyse time series?Pre-processingTrend analysisPattern analysis

  • IntroductionWhat are Time Series?Values of a variable measured at different time pointsE.g. scuba dive profile data is a time seriesWhy time series are important?Many domains have tons of time seriesMeteorology weather simulations predict values of dozens of weather parameters such as temperature and rainfall at hourly intervalsGas turbines carry hundreds of sensors to measure parameters such as fuel intake and rotor temperature every secondNeonatal Intensive Care Units (NICU) measure physiological data such as blood pressure and heart rate every secondTime series reveal temporal behaviour of the underlying mechanism that produced the dataSuch as the diving behaviour of a scuba diver

  • Example (Gas Turbine)A time series has sequence of Values andTheir corresponding timestamps (the time at which the values are true)

  • Time Series AutocorrelationAutocorrelation is a special property of time seriesEach value of a time series is correlated to older values from the same seriesThis means, data measurements in a time series are not independentThe depth of a scuba dive at a particular timestamp is dependent on the depths reached before the timestampPeriodic patterns seen on the gas turbine plot in the previous slide are results of autocorrelationTime series analysis is special because of this temporal dependency among values of a seriesA time series exhibits internal structure

  • Analysis of Time SeriesThree main stepsPre-processingTrend analysisPattern analysisNot all applications require all three stepsKnowledge acquisition studies provide the guidance to determine the required stepsPreprocessingInput raw series may be noisyDue to errors in measurement or observationData needs to be smoothed to remove noiseMany noise removal techniques also known as filters such asMoving averages or mean filterMedian filter

  • Example Series

  • Rate of change sensitive to noise

  • Mean FilterThere are many versionsOur version ( weighted average method)Assume a window time size, T for the filterdT difference in time between two successive valuesFor each value in the series, computeCurrent smoothed value =((previous smoothed value * T) + (current value*dT))/(T+dT)

  • Smoothing

  • Median FilterThe idea is similar to Mean filterInstead of using mean we use medianNote: in our version of the mean we did not compute a simple mean (average) of the selected valuesWe used a weighted averageKnown to perform better in the presence of outliers

  • Trend AnalysisTrends can be established usingline fitting techniques for linear datacurve fitting techniques for non-linear dataLine Fitting techniques for time series more popularly called segmentation techniquesMany segmentation algorithmsSliding windowTop-downBottom-up and Others (genetic algorithms, wavelets, etc)All segmentation algorithms have different flavours of implementation within the main methodWe only learn the main methodSegmentation in general can be viewed as a search for a best possible combination of segments in a space of all the possible segments

  • SegmentationThe curve at the top shows the original time seriesThe next graphic is the piecewise linear representation or segmented version of itSegmented version of the time series is an approximation of the original seriesIn other words, segmentation may involve loss of information in addition to the loss of noise

  • Error Tolerance ValueOne important parameter controlling the segmentation process is the error tolerance valueIt is the amount of error that can be allowed in the segmented representationCorresponds to the allowed information lossIf the value of ETV is zero segmentation returns a segmented representation without any information lossLarge enough values of ETV make segmentation to return one segment losing all the information contained in the original signal in the segmentation processSpecification of ETV is linked to the distinction of information and noiseIn a particular contextFor a particular task

  • Cost ComputationAll segmentation algorithms need a method to compute the cost of segmentationSeveral possible techniques:Simply take maximum error in a segmentCompute the total error in a segmentCompute the least square errorYou will use the maximum error as the cost metric in practical 2I have implementation of least square error as well if anybody wants to explore

  • Sliding window segmentationThis algorithm is suitable for segmenting time series obtained in real time (streaming time series)RequirementsDevelop a method for computing the cost of merging adjacent segments Select two parametersan appropriate window size and Error tolerance valueThe methodForm a segment with the values of the input series falling in the windowCompute the cost of the segmentwhile the cost of the segment is below the error tolerance valueGrow the segment by moving the window forward in the seriesWhen a segment cannot grow any more store it in the segmented representation and continue at step 1 with a new segmentYou will study an implementation of this algorithm in practical 2

  • Bottomup SegmentationEmpirical evaluation studies with all segmentation algorithms suggest that the bottom-up algorithm is the bestBecause it provides a globally optimized segmented representationRequirementsDevelop a method for computing the cost of merging adjacent segmentsSelect an appropriate error tolerance valueBottom-up approach to segmentationBegin by creating n/2 segments joining adjacent points in a n-length time seriesCompute the cost of merging adjacent segmentsIteratively merge the lowest cost pair until a stopping criterion is metThe stopping criterion is based on error tolerance valueYou will study an implementation of this algorithm in practical 2

  • Wind Prediction Data

  • Segmentation of wind prediction data

    Chart3

    44

    96

    127

    1510

    1812

    2116

    1818

    Time

    Wind Speed

    Segmentation Model

    Sheet1

    644

    96

    127

    1510

    1812

    2116

    241818

    32

    46

    Sheet1

    Time

    Wind Speed

    Step Model

    Sheet2

    Sheet3

    Sheet3

    4

    6

    7

    10

    12

    16

    18

    5 knots

    5 knots

    Wind Speed

    Time

    Wind Speed(knots)

    Step Model

    44

    96

    127

    1510

    1812

    2116

    1818

    Time

    Wind Speed

    Segmentation Model

  • Pattern AnalysisWhat is a pattern?A portion of the series that can be identified as a unit rather than as enumeration of all the values in that portionSome patterns may be periodic they repeat at regular time intervals (autocorrelation)Users are interested in patterns occurring in time seriesE.g. rapid ascent patterns in scuba dive profile dataSpikes and oscillations in gas turbine dataMainly two stepsPattern locationPattern classification

  • Pattern classification and Time ScaleMost patterns are classified based on the visual shape of the patternE.g. A step pattern looks like a step

    When the time scale changes the visual shape of a pattern changes

    Pattern classification sensitive to the time scale at which visualization is shown

  • Symbolic Representations of Time SeriesLatest trend in mining time seriesConvert numerical time series into an equivalent symbolic representationSymbolic Aggregate Approximation (SAX) is a well known representationEfficient algorithms available for doing this transformationOnce a time series is available in string formString analysis techniques can be used for analysing time series dataYou will use a simple string based representation of dive profile data in practical 2baabccbc

  • SummaryTime Series are Ubiquitous!Three main data analysis stepsPre-processingsmoothingTrend analysisLine fittingPattern analysisLocation and classificationIssues due to time scale