anomaly detection in multivariate non-stationary time ... · in this paper, two kinds of time...

Anomaly Detection in Multivariate Non-stationary Time Series forAutomatic DBMS Diagnosis

Doyup Lee*Department of Creative IT Engineering

Pohang University of Science and Technology77 Cheongam-ro Nam-gu, Pohang, Gyeongbuk,

Republic of [email protected]

Abstract— Anomaly detection in database managementsystems (DBMSs) is difficult because of increasing number ofstatistics (stat) and event metrics in big data system. In thispaper, I propose an automatic DBMS diagnosis system thatdetects anomaly periods with abnormal DB stat metrics andfinds causal events in the periods. Reconstruction error fromdeep autoencoder and statistical process control approach areapplied to detect time period with anomalies. Related eventsare found using time series similarity measures between eventsand abnormal stat metrics. After training deep autoencoderwith DBMS metric data, efficacy of anomaly detectionis investigated from other DBMSs containing anomalies.Experiment results show effectiveness of proposed model,especially, batch temporal normalization layer. Proposed modelis used for publishing automatic DBMS diagnosis reports inorder to determine DBMS configuration and SQL tuning.

Index Terms— anomaly detection, non-stationary time series,automatic diagnosis, reconstruction error, statistical processcontrol, transfer learning

I. INTRODUCTIONDatabase management systems (DBMSs) become the

critical part of Big Data application because of hardnessto control increasing number of data and its scale [17].Anomaly detection in DBMS is important to 1) preventfrom crucial problems such as DBMS black out and lowperformance, and 2) determine configuration and SQL tuningpoint from database administers (DBAs). Previously, DBAshave investigated anomaly periods with DBMS disoders andfound causal events manually based on their own experience.However, control of modern DBMS has surpass humanability as the size and complexity of DBMS grow [1].

”Anomaly” can be defined in different way in respectfrom data characteristics ans its domain [14]. In practicalview, I indicate anomaly in time series as unpredictable orunexplainable change which can’t be observed and inferred.For example, sudden additive outliers or trend changes areanomalies in unpredictability perspective. Unpredictabilityapproach is appropriate to define anomaly in practical man-ner. For example, it is not crucial if the changes werepredictable in past, although some changes arise.

Anomaly detection is hard to implement with machinelearning because of expensive labeled data and few number

Data, workstation and efficacy of experiment results are produced byEXEM Co.

of data [5]. In addition, anomaly detection is important inprofessional fields such DBMSs, IT securities, or indus-trial monitoring [14]. Eventually, making labeled anomalydata needs difficult processes by experts in each domain.However, it is impossible with expensive cost for expertsto construct labeled anomaly dataset finding and recordingfew number of anomalies every time. Thus, many previousanomaly detection models can’t be utilized in practice be-cause most of the models are based on supervised learningapproach [11, 16]. Especially, in time series cases, previousresearches have limited to only stationary or periodic timeseries data.

With the development of big data technology, deep learn-ing have outperformed in various area such as image recog-nition, natural language process [18, 21]. Deep learningmodels enable to extract hidden representation in big datawhich human can’t find easily with hand-craft programming.Among them, deep autoencoder finds hidden representationin high dimensional data with multi layers encoder-decoderstructure, extracting lower dimensional abstract features toreconstruct of input data [19].

In this paper, I propose anomaly detection model based onreconstruction error using autoencoder with non-stationarytime series data in DBMSs. Proposed model aims to detectanomalies in time series DBMS metrics, determine timeperiods containing anomalies, and find causal events relatedwith anomalies in the period.

II. THEORETICAL BACKGROUND

A. Anomaly Detection

Anomaly detection refers to finding unexpected behaviorsthat do not conform to observed pattern [5]. In machinelearning, it means data patterns that are not available intraining or insufficient to train because of few number ofsamples. Based on model characteristic, anomaly detectionmodels are divided into 5 categories: probabilistic, distance-based, reconstruction-based, domain-based, and information-theoretic model. Especially, neural network-based model canbe utilize to detect anomalies as reconstruction-based model[14].

Reconstruction-based anomaly detection models use re-construction error as criteria of anomaly or anomaly score.

arX

iv:1

708.

0263

5v2

[st

at.M

L]

9 O

ct 2

017

Fig. 1. Control Chart Example [13]

That is, it assorts data patterns into anomaly if the patternshave high error score after reconstruction. As reconstructionmodels, neural network-based models containing encoder-decoder structure and subspace-based model such as (kernel)PCA are commonly used [2, 6, 15]. Setting error thresholdis one of the problem to utilize reconstruction-based modelfor anomaly detection.

B. Deep Autoencoder

An autoencoder is unsupervised learning model that hasencoder-decoder structure with feed-forward neural network.It trains reconstruction model which infers input x as output.Input x is encoded into feature space C(x) and decodedinto original space. In general, autoencoder is used to extracthidden representation of input distribution with lower dimen-sionality [7]. In this paper, autoencoder is trained and usedin order to reconstruct input data. With multi-layers neuralnetwork in encoder and decoder, deep (stacked) autoencodercan extract more abstract and complex representation of inputdistribution [19].

C. Statistical Process Control

Statistical Process Control (SPC) is a statistical modelto monitor outputs of a process. The objective of SPC isto find assignable causes that are unpredictable and lowerprocess capability or performance. In order to know whetherassignable causes exist or not, control charts are used in SPC.Control charts monitor characteristic value of a process suchas sample standard deviation and sample mean of processoutput.

When sample mean is used, it is called X chart. Theaverage of sample mean, X is set as center line of controlchart. In addition, X±3σ become upper control limit (UCL)and lower control limit (LCL), where σ is standard deviationof sample means [12].

A process is consider out-of-control when assignablecauses exist in the process. There are many rules foridentifying out-of-control and the most typical rule is todecide it when characteristic value X is located outside ofcontrol limits, UCL or LCL. When it occurs, the processis considered containing assignable causes that have to beeliminated in order to keep high stability and capability ofthe process.

Previous reconstruction-based anomaly detection modelshave a limitation on how to determine appropriate anomalythreshold in different data [11, 15]. However, when it iscombined with SPC, anomaly threshold can be set basedon statistical background on process control aspect.

D. Non-stationary Time Series

Let Z = {Zt}+∞−∞ is a sequence of random variables, calledtime series. It is said to be strictly stationary if its distributionis invariant to time periods.

FZt1 ,...,Ztn(x1, ..., xn) = FZt1+k,...,Ztn+k

(x1, ..., xn) (1)

Stationary time series is hard to define in real worlddata, so weak stationarity condition is usually applied. Weakstationarity means first and second moments are invariant totime periods.

In contrast, non-stationary time series has different dis-tribution according to time periods. In other words, non-stationary time series has time-varying means and variance.For example, time series which has trend of time hasnon-stationarity. In time series analysis, non-stationary timeseries is converted into stationary series using de-trending,differencing, and other data transformation techniques. [20]

When a machine learning model treats time series data,stationarity is also important in respect with functionapproximation. If input data has non-stationarity, the modeltries to learn different input distribution as input changes andeventually fails to learn successfully. In summary, convertingnon-stationary to stationary time series is essential whenmachine learning model learns its parameters.

III. PROPOSED MODEL

A. Problem Description

To define problem, I interviewed senior DBAs and DBconsultants who work in DBMS monitoring company inRepublic of Korea. The company has most of market share(above 70 %) of DB performance management market inKorea.

When some defects and disorders occur in database,DBMS users call DB consultants in order to find causaltime periods and events. After finding, consultants resolvethe problems, changing DBMS configurations or tune wrongSQL. However, from detecting disorder periods to takingsolutions, consultants have to depend on their own pastexperience and manual efforts when they work. They spendmost of time to find disorder periods and causal events inlog data analysis process, because there are hundreds of eventmetrics in DBMS.

In order to analysis DBMS log data, they manually findtime point that main DB stats and wait events suddenlychange. After then, they confirm anomaly periods and findwait events, similar with change patterns of main stat met-rics. Based on causal wait events detected, they implementsolutions like configuration or SQL tuning.

Fig. 2. Stat Metrics Example: Active Session and Session Logical Reads

Although DBMS configuration and SQL tuning mightneed professional knowledge of DBAs and consultants, de-tection of causal periods and related events can be foundautomatically by DBMS metrics. It reduces consultants’ timeconsumption enough. The objective of proposed model isautomatic diagnosis that finds time period with anomaliesand causal events related with DB anomalies.

B. Data

In this paper, two kinds of time series data are used: 1)DBMS Statistics (stat) Metrics and 2) DBMS Events Metrics.Data is collected in a minute.

1) DBMS Stat Metrics: Stat metrics describe basic infor-mation of DB status that how DB works. There are hundredsof stat metrics, but 6 number of main stat metrics are used inthis paper. The stat metrics are selected based on DBAs andconsultants’ interview. The selected metrics are main onesthat DB consultants consider when find anomaly periods.Example plots of stat metrics data are in Figure 2, activesession and session logical reads in minutes.• CPU Used: How much CPU is used when it collected.• Active Session: The number of sessions that executes

SQL.• Session Logical Reads: The number of requested blocks

for execution of queries• Physical Reads: The number of blocks that disk reads.• Execute Counts: The number of execution of SQL• Lock Waiting Session: The number of sessions that

waits in lock.

2) DBMS Event Metrics: DBMS event metrics describewait information that occurs in DB when sessions requestqueries. For example, direct path read, SQL net messagefrom client, and LGWR wait for redo copy are containsin event metrics. There are also hundreds of event metrics.They are used to diagnose causes of DB disorders at DBMSconfiguration and SQL levels in anomaly periods.

C. Model Scheme

Figure 3. shows model scheme that is proposed for detect-ing anomalies in this paper. From DBMS stat metrics data,anomaly detector that determine time periods with anomaliesis trained. After training model, the model is utilized to

Fig. 3. Proposed Model Scheme

detect anomaly periods in other DBMSs, not same DBMS.In other words, trained model is transfered to other DBMSsand additional training is optional in here.

Production of pre-trained model is important in serviceproviders’ perspective. There are many customer companiesthat want to detect anomalies automatically before consulta-tion of their DBMSs. If there is no pre-trained model, eachcompany, even each DBMS, can’t uses automatic anomalydiagnosis until they store their DBMS metrics and traintheir machine learning model with optimal hyper-parameters.After pre-trained model is produced with basic and validperformance, each DBMS can be customized by additionaltraining process after they store their data. Thus, in this paper,efficacy of transfered model is verified after training anomalydetector.

When anomaly detector finds anomaly periods, causalevents that have most similar pattern with anomaly stat met-ric are investigated based on time series similarity measuressuch as dynamic time warping (DTW) [4, 9] or Pearsoncorrelation [3].

Finally, anomaly periods, related stat, and event metricsare reported automatically to DB users and consultants.Based on diagnosis result, consultants change DBMS con-figuration or specific SQL in order to solve and prevent fromDB disorders.

D. Anomaly Detector Model Description

In order to detect anomalies in DBMSs, reconstructionapproach is used in proposed model. Deep autoencoder trainsDBMS stat metrics data and it determines anomaly periodsbased on reconstruction error. In other words, output that hashigh reconstruction error is interpreted as anomaly samplewhich can’t be predictable with patterns of past stat metrics.

Figure 4 shows deep learning model architecture foranomaly detection. With repeated experiments, specific ar-chitecture is set with 3 hidden layers autoencoder. In thismodel, time series inputs are connected to batch temporal

Fig. 4. Anomaly Detector Model Architecture

normalization layer in order to confirm stationarity of eachsample. In decoder part, reverse calculation is proceededcorresponding to encoder part.

E. Batch Temporal Normalization

Batch normalization is critical technique for fast learningspeed and generalization [8]. In this paper, batch temporalnormalization layer is proposed for stationarity of inputtime series. It normalizes input data when batch input datais selected as batch normalization does. However, it doesnot have same calculation with batch normalization becausethe objective is not correspondence of feature scale, butcorrespondence of moments in different time periods.

Figure 5 shows operational difference between batch nor-malization and batch temporal normalization. Yellow partis the calculation range of each normalization operation.Batch temporal normalization calculates mean and standarddeviation of each sample on time steps, but batch normal-ization calculates them both on batches and time steps.Batch temporal normalization makes each sample have samemean and standard deviation regardless of time periods forstationarity of input. I note that this calculation is essentialwhen machine learning model treats non-stationary timeseries data. Same as with batch normalization, scale andoffset parameters are trained in batch temporal normalizationlayer.

F. Anomaly Period Detection with SPC

In test part of Figure 3, pre-trained anomaly detectoris used to determine time periods containing anomalies.Proposed model uses SPC approach based on reconstructionerror. In order words, sample anomaly score in (2) is acharacteristic value of control chart that monitors processstability. If sample anomaly score is out of control limits,corresponding time period is considered anomaly periodwhen related events need to be found.

Fig. 5. Difference between Batch Normalization and Batch TemporalNormalization

(Sample Anomaly Score)ij =1

T

T∑t=1

(yi,j,t − yi,j,t)2

(2)where i,j is sample and feature index.

IV. EXPERIMENTS

A. Training Results of Deep Autoencoder

In order to find appropriate hyper-parameters of deep au-toencoder model before training, repetitive experiments wereproceeded. The number of layers and neurons, optimizer,learning rate, regularization parameter, and batch size wereselected after empirical repetitions. ADAM optimizer [10],0.001 learning rate and regularization parameter, and 1500size of batch are used to train the model. Proposed modelis implemented using Python 2.7.12, Tensorflow 1.0.1, andNumpy 1.12.1. Workstation with a Titan X GPU is used totrain proposed model.

For training, 6 stat metrics described above section arecollected from a DBMS in a minute for a month. Trainingtime series data is constructed with time window 30 steps inorder to detect anomaly periods in 30 minutes. Collected datais divided into 60%, 20%, and 20% for training, validation,and test. Trained model with lowest validation loss is selectedas final trained model for model generalization. Activationfunction in all layer except output is Rectified Linear Unit(ReLU). ReLU is proper to real value time series thansigmoid. Mean square error is used to train model as lossfunction. For faster learning, input features are normalizedwith moments as to total time period.

Train Loss =1

N

N∑i=1

F∑j=1

T∑t=1

(yi,j,t − yi,j,t)2 (4)

Table 1 shows test results of various deep autoencodermodels. Asterisks in Table 1 mean reverse calculations indecoder part of autoencoder. I found that batch temporalnormalization improves performance of deep learning modelthat input time series has non-stationarity. In Table 1, simple(150)-(50)-(150*) model has lowest test error although it

Fig. 6. Learning Curve of BTN-(150)-(50)-(150*)-BTN*

doesn’t include batch temporal normalization layer. However,in next subsection, it was not selected as optimal anomalydetector because is has no ability to detect anomaly periods.Although models with BTN have higher test loss than simpleone, they could detect anomaly periods accurately. Finally,BTN-(150)-(50)-(150*)-(50*)-BTN* model was selected asoptimal anomaly detector. I note that batch temporal normal-ization layer is essential to take internal dynamics of non-stationary time series in deep learning model. Figure 6 showslearning curve of optimal anomaly detector in log scale.

Model (Neurons) Test (MSE)PCA-network (50) 0.7951

PCA-network (50) with BTN 1.1560(150)-(50)-(150*) 0.3096

(150)-BN-(50)-BN*-(150*) 0.8620BN-(150)-(50)-(150*)-BN* 1.2432

BTN-(150)-(50)-(150*)-BTN* 0.6912BTN-(150)-BN-(50)-BN*-(150*)-BTN* 0.7803

BN-(500)-(300)-(150)-(300*)-(500*)-BN* 4.4816BTN-(500)-(300)-(150)-(300*)-(500*)-BTN* 0.7034

BTN-(500)-BN-(300)-(150)-(300*)-BN*-(500*)-BTN* 0.6925

TABLE ITEST RESULT OF TRAINED ANOMALY DETECTOR MODELS

B. Anomaly Periods Detection

Anomaly periods detection test is proceeded in otherDBMSs, where training data is not from. Detection perfor-mance was validated by blind tests on past DBMS disordersamples that DBAs had diagnosed. I validated efficacy ofanomaly periods detection with 10 disorder cases. Proposedmodel detects multiple periods as anomalies, but DBAsrecorded only one periods as disorder periods. However, top-1 error period corresponded to DBAs’ opinion. I shows someexample of anomaly periods detection.

Figure 7 shows DBMS disorder case 1. DBAs diagnoseddisorder period when active session suddenly peaked (insecond graph). Figure 8 shows the result of anomaly periodsdetection by proposed (optimal) model. Red lines indicateupper central limit, center line, and lower central limitin SPC. It shows that anomaly detector successfully finds

Fig. 7. DBMS Disorder Case 1

all time periods that each stat metric has radical changes,especially sudden peaks. Of course, it detects the time periodthat DBAs concluded as disorder period with much highanomaly score.

Figure 9 shows anomaly detection results of (150)-(50)-(150*) model that has lowest test loss in previous subsection.The anomaly score patterns are similar regardless of featuretype and detect almost same results except few periods.Especially, the pattern is close to ’CPU used’ anomaly scorepattern in Figure 8. I checked that it was not a mistake onprogramming and the terrible results are from the absenceof BTN layer. I found scales of Figure 9 are extreme high.For example, in active session case, selected model scoresanomaly error up to 80 in Figure 8, but Figure 9 showsabout 4000. Anomaly scores in the other features have sameproblem in Figure 9.

Main causes of this problem are inferred with 3 reasons.First, all metrics in training data have similar pattern in sametime. Second, detection model is trained with multivariatetime series data, not univariate. It means relationship betweeninput metrics effects on output. In addition, radical changein a feature can have several effects on other features ifthe model is not well-trained. Third, the trained modelwithout BTN is seriously sensitive to specific feature value,in here CPU used. Especially, it is critical problem of themodel in Figure 9, considering the data is even preprocessedbefore inference. Thus, reconstruction error of all metricshave extremely high value when specific metric has anomalyvalue. It is empirically shown that BTN prevent from high

Fig. 8. Anomaly Detection Result of Case 1

dependency on specific features with non-stationary timeseries data. Figure 8, 9, and 10 are another test exampleand its anomaly detection results. It shows same results withcase 1.

Although DBAs and consultants detect an anomaly peri-ods in a sample with their manual effort, proposed modelsuggests multiple candidates of anomaly period based onnumerical anomaly score. If the results are provided toDBAs when they detect disorder periods, it helps them makecomplex results beyond their experience and save time todetect.

C. Casual Events Detection

After detecting anomaly period samples, that is, out-of-control time period in SPC, causal events have to be foundin order to figure out specific reasons of disorder. After DBAsunderstand main reason causing anomaly events, they takesome solutions such as DBMS configuration or SQL tuning.

Time series distance/similarity measures are used to detectcausal events that are most related with anomaly patternsof corresponding stat metrics in the period. Dynamic TimeWarping (DTW) and Pearson correlation are both calculatedin order to find best matched event with stat metric inanomaly period. In this paper, related events with anomalyperiod in case 1 when active session suddenly peaks isattached below.

Figure 13 shows example of stat metrics in anomalyperiod and results of best matched events derived by dis-tance/similarity measures. The difference of measures can be

Fig. 9. Anomaly Detection Result of Case 1 with no BTN

Fig. 10. DBMS Disorder Case 2

Fig. 11. Anomaly Detection Result of Case 2

Fig. 12. Anomaly Detection Result of Case 2 with no BTN

Fig. 13. Related Event Examples in Case 1

known in Figure 13. DTW doesn’t calculate distance limitedto corresponding time step pairs, but Pearson correlationonly calculates with corresponding time step. DTW focuseson distance of period in real value perspective and Pearsoncorrelation does similarity of time step in increase/decreaseratio perspective. In time step level, pattern of the eventwith highest Pearson correlation is similar with pattern ofactive sessions. However, the value of event doesn’t increasedramatically.

In contrast, overall pattern of the event with lowest DTWdoesn’t seem similar with active sessions, it dramaticallyincreases right after active session suddenly peaks in extremevalue. When considering that causal events don’t arise insame time with stat metrics, DTW seems to be appropriatemeasure to detect causal events in anomaly period. However,in this model, I decided to produce both measure informationbecause of DBAs’ request.

V. CONCLUSION AND FUTURE WORK

I proposed a machine learning model for automatic DBMSdiagnosis. The proposed model detects anomaly periodsfrom reconstruct error with deep autoencoder. I also verifiedempirically that temporal normalization is essential wheninput data is non-stationary multivariate time series.

With SPC approach, time period is considered anomalyperiod when reconstruction error is outside of control limit.According types or users of DBMSs, decision rules that areused in SPC can be added. For example, warning line with2 sigma can be utilized to decide whether it is anomaly ornot [12, 13].

In this paper, anomaly detection test is proceeded in otherDBMSs whose data is not used in training, because per-formance of basic pre-trained model is important in serviceproviders’ perspective. Efficacy of detection performance isvalidated with blind test and DBAs’ opinions. The resultof automatic anomaly diagnosis would help DB consultantssave time for anomaly periods and main wait events. Thus,

they can concentrate on only making solution when DBdisorders occur.

For better performance of anomaly detection, additionaltraining can be proceeded after pre-trained model is adopted.In addition, recurrent and convolutional neural network canbe used in reconstruction part to capture hidden repre-sentation of sequential and local relationship. If anomalylabeled data is generated, detection result can be analyzedwith numerical performance measures. However, in practice,it is hard to secure labeled anomaly dataset according toeach DBMS. Proposed model is meaningful in unsupervisedanomaly detection model that doesn’t need labeled data andcan be generalized to other DBMSs with pre-trained model.

ACKNOWLEDGMENT

This research was supported by the MSIT(Ministry ofScience and ICT), Korea, under the ICT Consilience Cre-ative program(IITP-2017-R0346-16-1007) supervised by theIITP(Institute for Information & communications Technol-ogy Promotion)

REFERENCES

[1] Van Aken, Dana, et al. ”Automatic Database Management SystemTuning Through Large-scale Machine Learning.” Proceedings of the2017 ACM International Conference on Management of Data. ACM,2017.

[2] Andrews, Jerone TA, Edward J. Morton, and Lewis D. Griffin.”Detecting anomalous data using auto-encoders.” International Journalof Machine Learning and Computing 6.1 (2016): 21.

[3] Benesty, Jacob, et al. ”Pearson correlation coefficient.” Noise reductionin speech processing. Springer Berlin Heidelberg, 2009. 1-4.

[4] D. J. Berndt and J. Clifford. Using dynamic time warping to findpatterns in time series. In KDD Workshop, 1994.

[5] Chandola, Varun, Arindam Banerjee, and Vipin Kumar. ”Anomalydetection: A survey.” ACM computing surveys (CSUR) 41.3 (2009):15.

[6] Hoffmann, Heiko. ”Kernel PCA for novelty detection.” Pattern Recog-nition 40.3 (2007): 863-874.

[7] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. ”Reducing the di-mensionality of data with neural networks.” science 313.5786 (2006):504-507.

[8] Ioffe, Sergey, and Christian Szegedy. ”Batch normalization: Accel-erating deep network training by reducing internal covariate shift.”International Conference on Machine Learning. 2015.

[9] E. J. Keogh and C. A. Ratanamahatana. Exact indexing of dynamictime warping. Knowl. Inf. Syst., 7(3), 2005.

[10] Kingma, Diederik, and Jimmy Ba. ”Adam: A method for stochasticoptimization.” arXiv preprint arXiv:1412.6980 (2014).

[11] Malhotra, Pankaj, et al. ”Long short term memory networks foranomaly detection in time series.” Proceedings. Presses universitairesde Louvain, 2015.

[12] Montgomery, Douglas C. Statistical quality control. Vol. 7. New York:Wiley, 2009.

[13] Oakland, John S. Statistical process control. Routledge, 2007.[14] Pimentel, Marco AF, et al. ”A review of novelty detection.” Signal

Processing 99 (2014): 215-249.[15] Shyu, Mei-Ling, et al. A novel anomaly detection scheme based on

principal component classifier. MIAMI UNIV CORAL GABLES FLDEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003.

[16] Slch, Maximilian, et al. ”Variational inference for on-lineanomaly detection in high-dimensional time series.” arXiv preprintarXiv:1602.07109 (2016).

[17] M. Stonebraker, S. Madden, and P. Dubey. Intel ”big data” scienceand technology center vision and execution plan. SIGMOD Rec.,42(1):4449, May 2013.

[18] Szegedy, Christian, et al. ”Inception-v4, Inception-ResNet and theImpact of Residual Connections on Learning.” AAAI. 2017.

[19] Vincent, Pascal, et al. ”Extracting and composing robust featureswith denoising autoencoders.” Proceedings of the 25th internationalconference on Machine learning. ACM, 2008.

[20] Wei, William WS. Time series analysis: univariate and multivariatemethods. Pearson Addison Wesley, 2006.

[21] Wu, Yonghui, et al. ”Google’s neural machine translation system:Bridging the gap between human and machine translation.” arXivpreprint arXiv:1609.08144 (2016).

http://arxiv.org/abs/1412.6980



anomaly detection in multivariate non-stationary time ... · in this paper, two kinds of time...

Documents