era of big data processing: a new approach via tensor ...big data analytics require novel...

30
1 Era of Big Data Processing: A New Approach via Tensor Networks and Tensor Decompositions Andrzej CICHOCKI RIKEN Brain Science Institute, Japan and Systems Research Institute of the Polish Academy of Science, Poland [email protected] Part of this work was presented on the International Workshop on Smart Info-Media Systems in Asia, (invited talk - SISA-2013) Sept.30–Oct.2, 2013, Nagoya, Japan Abstract—Many problems in computational neuroscience, neuroinformatics, pattern/image recognition, signal processing and machine learning generate massive amounts of multidimensional data with multiple aspects and high dimensionality. Tensors (i.e., multi-way arrays) provide often a natural and compact representation for such massive multidimensional data via suitable low-rank approximations. Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a new emerging technology for multidimensional big data is a multiway analysis via tensor networks (TNs) and tensor decompositions (TDs) which represent tensors by sets of factor (component) matrices and lower-order (core) tensors. Dynamic tensor analysis allows us to discover meaningful hidden structures of complex data and to perform generalizations by capturing multi-linear and multi-aspect relationships. We will discuss some fundamental TN models, their mathematical and graphical descriptions and associated learning algorithms for large-scale TDs and TNs, with many potential applications including: Anomaly detection, feature extraction, classification, cluster analysis, data fusion and integration, pattern recognition, predictive modeling, regression, time series analysis and multiway component analysis. Keywords: Large-scale HOSVD, Tensor decompositions, CPD, Tucker models, Hierarchical Tucker (HT) decompo- sition, low-rank tensor approximations (LRA), Tensoriza- tion/Quantization, tensor train (TT/QTT) - Matrix Product States (MPS), Matrix Product Operator (MPO), DMRG, Strong Kronecker Product (SKP). I. Introduction and Motivations Big Data consists of multidimensional, multi-modal data-sets that are so huge and complex that they cannot be easily stored or processed by using standard comput- ers. Big data are characterized not only by big Volume but also another specific “V” features (see Fig. 1). High Volume implies the need for algorithms that are scalable; High Velocity address the challenges related to process data in near real-time, or virtually real-time; High Verac- ity demands robust and predictive algorithms for noisy, incomplete or inconsistent data, and finally, high Variety may require integration across different kind of data, e.g., neuroimages, time series, spiking trains, genetic and behavior data. Many challenging problems for big data PB TB GB MB VERACITY VOLUME VARIETY VELOCITY Batch Genetic Expressions, Behaviour Data Micro-batch Periodic Near Real Time Real Time EEG,MEG,EMG Spectrograms Time Series ECoG,MRI,MUR Neuroimage, Spikes fMRI+DTI+PET Anomaly Inconsitent Biased Incomplete Data Noise Figure 1: Big Data analysis for neuroscience recordings. Brain data can be recorded by electroencephalography (EEG), electro- corticography (ECoG), magnetoencephalography (MEG), fMRI, DTI, PET, Multi Unit Recording (MUR). This involves analysis of multiple modalities/multiple subjects neuroimages, spec- trograms, time series, genetic and behavior data. One of the challenges in computational and system neuroscience is to make fusion (assimilation) of such data and to understand the multiple relationships among them in such tasks as perception, cognition and social interactions. The our “V”s of big data: Vol- ume - scale of data, Variety - different forms of data, Veracity - uncertainty of data, and Velocity - analysis of streaming data, comprise the challenges ahead of us. are related to capture, manage, search, visualize, cluster, classify, assimilate, merge, and process the data within a tolerable elapsed time, hence demanding new innovative solutions and technologies. Such emerging technology is Tensor Decompositions (TDs) and Tensor Networks (TNs) via low-rank matrix/tensor approximations. The challenge is how to analyze large-scale, multiway data arXiv:1403.2048v4 [cs.ET] 24 Aug 2014

Upload: others

Post on 17-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

1

Era of Big Data Processing: A New Approachvia Tensor Networks and Tensor

DecompositionsAndrzej CICHOCKI

RIKEN Brain Science Institute, Japanand Systems Research Institute of the Polish Academy of Science, Poland

[email protected] of this work was presented on the International Workshop on Smart Info-Media Systems in Asia,

(invited talk - SISA-2013) Sept.30–Oct.2, 2013, Nagoya, Japan

Abstract—Many problems in computational neuroscience,neuroinformatics, pattern/image recognition, signalprocessing and machine learning generate massive amountsof multidimensional data with multiple aspects and highdimensionality. Tensors (i.e., multi-way arrays) provideoften a natural and compact representation for such massivemultidimensional data via suitable low-rank approximations.Big data analytics require novel technologies to efficientlyprocess huge datasets within tolerable elapsed times. Sucha new emerging technology for multidimensional big datais a multiway analysis via tensor networks (TNs) and tensordecompositions (TDs) which represent tensors by sets offactor (component) matrices and lower-order (core) tensors.Dynamic tensor analysis allows us to discover meaningfulhidden structures of complex data and to performgeneralizations by capturing multi-linear and multi-aspectrelationships. We will discuss some fundamental TNmodels, their mathematical and graphical descriptions andassociated learning algorithms for large-scale TDs andTNs, with many potential applications including: Anomalydetection, feature extraction, classification, cluster analysis,data fusion and integration, pattern recognition, predictivemodeling, regression, time series analysis and multiwaycomponent analysis.

Keywords: Large-scale HOSVD, Tensor decompositions,CPD, Tucker models, Hierarchical Tucker (HT) decompo-sition, low-rank tensor approximations (LRA), Tensoriza-tion/Quantization, tensor train (TT/QTT) - Matrix ProductStates (MPS), Matrix Product Operator (MPO), DMRG,Strong Kronecker Product (SKP).

I. Introduction and Motivations

Big Data consists of multidimensional, multi-modaldata-sets that are so huge and complex that they cannotbe easily stored or processed by using standard comput-ers. Big data are characterized not only by big Volumebut also another specific “V” features (see Fig. 1). HighVolume implies the need for algorithms that are scalable;High Velocity address the challenges related to processdata in near real-time, or virtually real-time; High Verac-ity demands robust and predictive algorithms for noisy,incomplete or inconsistent data, and finally, high Varietymay require integration across different kind of data,

e.g., neuroimages, time series, spiking trains, genetic andbehavior data. Many challenging problems for big data

PB

TB

GB

MB

VERACITY

VOLUME

VARIETY

VELOCITY

BatchGeneticExpressions,Behaviour DataMicro-batch

Periodic

Near Real Time

Real Time

EE

G,M

EG

,EM

GSpe

ctro

gram

s

Tim

e Ser

ies

EC

oG,M

RI,

MU

RN

euro

imag

e,

Spi

kes

fMR

I+D

TI+

PE

T

Ano

mal

yIn

cons

iten

tB

iase

dIn

com

plet

e D

ata

Noi

se

Figure 1: Big Data analysis for neuroscience recordings. Braindata can be recorded by electroencephalography (EEG), electro-corticography (ECoG), magnetoencephalography (MEG), fMRI,DTI, PET, Multi Unit Recording (MUR). This involves analysisof multiple modalities/multiple subjects neuroimages, spec-trograms, time series, genetic and behavior data. One of thechallenges in computational and system neuroscience is tomake fusion (assimilation) of such data and to understand themultiple relationships among them in such tasks as perception,cognition and social interactions. The our “V”s of big data: Vol-ume - scale of data, Variety - different forms of data, Veracity -uncertainty of data, and Velocity - analysis of streaming data,comprise the challenges ahead of us.

are related to capture, manage, search, visualize, cluster,classify, assimilate, merge, and process the data within atolerable elapsed time, hence demanding new innovativesolutions and technologies. Such emerging technologyis Tensor Decompositions (TDs) and Tensor Networks(TNs) via low-rank matrix/tensor approximations. Thechallenge is how to analyze large-scale, multiway data

arX

iv:1

403.

2048

v4 [

cs.E

T]

24

Aug

201

4

Page 2: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

2

y

Horizontal Slices Lateral Slices Frontal Slices

X1,:,: X:,1,:

X:,:,1

Column (Mode-1)Fibers

Row (Mode-2)Fibers

Tube (Mode-3)Fibers

x:,3,1

x1,:,3

x1,3,:

Figure 2: A 3rd-order tensor X ∈ RI×J×K , with entries xi,j,k =X(i, j, k) and its sub-tensors: Slices and fibers. All fibers aretreated as column vectors.

sets. Data explosion creates deep research challenges thatrequire new scalable, TD and TN algorithms.

Tensors are adopted in diverse branches of sciencesuch as a data analysis, signal and image processing[1]–[4], Psychometric, Chemometrics, Biometric, Quan-tum Physics/Information, and Quantum Chemistry [5]–[7]. Modern scientific areas such as bioinformatics orcomputational neuroscience generate massive amountsof data collected in various forms of large-scale, sparsetabular, graphs or networks with multiple aspects andhigh dimensionality.

Tensors, which are multi-dimensional generalizationsof matrices (see Fig. 2 and Fig. 3), provide often auseful representation for such data. Tensor decompo-sitions (TDs) decompose data tensors in factor matri-ces, while tensor networks (TNs) represent higher-ordertensors by interconnected lower-order tensors. We showthat TDs and TNs provide natural extensions of blindsource separation (BSS) and 2-way (matrix) ComponentAnalysis (2-way CA) to multi-way component analysis(MWCA) methods. In addition, TD and TN algorithmsare suitable for dimensionality reduction and they canhandle missing values, and noisy data. Moreover, theyare potentially useful for analysis of linked (coupled)

(a)

G G

. . .G11

G12

G1K

Û

...

(b)

Û

. . .G11

G12

G1K

. . .G21

G22 G

2K

. . .GM1 G

M2G

MK

. . .

. . .

. . .

...

Figure 3: Block matrices and their representations by (a) a 3rd-order tensor and (b) a 4th-order tensor.

block of tensors with millions and even billions of non-zero entries, using the map-reduce paradigm, as wellas divide-and-conquer approaches [8]–[10]. This all sug-gest that multidimensional data can be represented bylinked multi-block tensors which can be decomposedinto common (or correlated) and distinctive (uncorre-lated, indpendent) components [3], [11], [12]. Effectiveanalysis of coupled tensors requires the development ofnew models and associated algorithms that can identifythe core relations that exist among the different tensormodes, and the same tome scale to extremely largedatasets. Our objective is to develop suitable models andalgorithms for linked low-rank tensor approximations(TAs), and associated scalable software to make suchanalysis possible.

Review and tutorial papers [2], [4], [13]–[15] and books[1], [5], [6] dealing with TDs already exist, however,they typically focus on standard TDs and/or do notprovide explicit links to big data processing topics anddo not explore natural connections with emerging areasincluding multi-block coupled tensor analysis and tensornetworks. This paper extends beyond the standard TDmodels and aims to elucidate the power and flexibility ofTNs in the analysis of multi-dimensional, multi-modal,and multi-block data, together with their role as a math-ematical backbone for the discovery of hidden structuresin large-scale data [1], [2], [4].

Motivations - Why low-rank tensor approximations?A wealth of literature on (2-way) component analysis(CA) and BSS exists, especially on Principal Compo-nent Analysis (PCA), Independent Component Analysis(ICA), Sparse Component Analysis (SCA), NonnegativeMatrix Factorizations (NMF), and Morphological Com-ponent Analysis (MCA) [1], [16], [17]. These techniquesare maturing, and are promising tools for blind sourceseparation (BSS), dimensionality reduction, feature ex-traction, clustering, classification, and visualization [1],[17].

Page 3: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

3

a

Scalar

=

OrthogonalTensor

S

a

Vector

=I

IA

Matrix

=A

I JI

J

Tensor

AI1

I2

I3

I1

I2

I3

=

DiagonalTensor

Λ

R

R

R

=R R

R

Figure 4: Basic symbols for tensor network diagrams.

The “flattened view” provided by 2-way CA andmatrix factorizations (PCA/SVD, NMF, SCA, MCA) maybe inappropriate for large classes of real-world datawhich exhibit multiple couplings and cross-correlations.In this context, higher-order tensor networks give us theopportunity to develop more sophisticated models per-forming distributed computing and capturing multipleinteractions and couplings, instead of standard pairwiseinteractions. In other words, to discover hidden com-ponents within multiway data the analysis tools shouldaccount for intrinsic multi-dimensional distributed pat-terns present in the data.

II. Basic Tensor Operations

A higher-order tensor can be interpreted as a multiwayarray, as illustrated graphically in Figs. 2, 3 and 4. Ouradopted convenience is that tensors are denoted by boldunderlined capital letters, e.g., X ∈ RI1×I2×···×IN , andthat all data are real-valued. The order of a tensor is thenumber of its “modes”, “ways” or “dimensions”, whichcan include space, time, frequency, trials, classes, anddictionaries. Matrices (2nd-order tensors) are denoted byboldface capital letters, e.g., X, and vectors (1st-ordertensors) by boldface lowercase letters; for instance thecolumns of the matrix A = [a1, a2, . . . , aR] ∈ RI×R aredenoted by ar and elements of a matrix (scalars) aredenoted by lowercase letters, e.g., air (see Table I).

The most common types of tensor multiplications aredenoted by: ⊗ for the Kronecker, � for the Khatri-Rao,~ for the Hadamard (componentwise), ◦ for the outerand ×n for the mode-n products (see Table II).

TNs and TDs can be represented by tensor networkdiagrams, in which tensors are represented graphicallyby nodes or any shapes (e.g., circles, spheres, triangular,squares, ellipses) and each outgoing edge (line) emergingfrom a shape represents a mode (a way, dimension,indices) (see Fig. 4) Tensor network diagrams are veryuseful not only in visualizing tensor decompositions,but also in their different transformations/reshapingsand graphical illustrations of mathematical (multilinear)operations.

It should also be noted that block matrices and hier-archical block matrices can be represented by tensors.

TABLE I: Basic tensor notation and symbols. A tensor are de-noted by underline bold capital letters, matrices by uppercasebold letters, vectors by lowercase boldface letters and scalarsby lowercase letters.

X ∈ RI1×I2×···×IN Nth-order tensor of sizeI1 × I2 × · · · × IN

G, Gr, GX, GY, S core tensors

Λ ∈ RR×R×···×R Nth-order diagonal core ten-sor with nonzero λr entries onmain diagonal

A = [a1, a2, . . . , aR] ∈ RI×R matrix with column vectorsar ∈ RI and entries air

A, B, C, B(n), U(n) component matrices

i = [i1, i2, . . . , iN ] vector of indices

X(n) ∈ RIn×I1···In−1 In+1···INmode-n unfolding of X

x:,i2,i3,...,iNmode-1 fiber of X obtained byfixing all but one index

X:,:,i3,...,iNtensor slice of X obtained byfixing all but two indices

X:,:,:,i4,...,iNsubtensor of X, in which sev-eral indices are fixed

x = vec(X) vectorization of X

diag{•} diagonal matrix

For example, 3rd-order and 4th-order tensors that canbe represented by block matrices as illustrated in Fig. 3and all algebraic operations can be performed on blockmatrices. Analogously, higher-order tensors can be rep-resented as illustrated in Fig. 5 and Fig. 6. Subtensors areformed when a subset of indices is fixed. Of particularinterest are fibers, defined by fixing every index butone, and matrix slices which are two-dimensional sections(matrices) of a tensor, obtained by fixing all the indicesbut two (see Fig. 2). A matrix has two modes: rows andcolumns, while an Nth-order tensor has N modes.

The process of unfolding (see Fig. 7) flattens a ten-sor into a matrix. In the simplest scenario, mode-nunfolding (matricization, flattening) of the tensor A ∈RI1×I2×···×IN yields a matrix A(n) ∈ RIn×(I1···In−1 In+1···IN),with entries ain ,(j1,...,in−1,jn+1,...,in) such that remaining in-dices (i1, . . . , in−1, in+1, . . . , iN) are arranged in a specificorder, e.g., in the lexicographical order [4]. In tensornetworks we use, typically a generalized mode-([n])unfolding as illustrated in Fig. 7 (b).

By a multi-index i = i1, i2, . . . , iN , we denote anindex which takes all possible combinations of valuesof i1, i2, . . . , in, for in = 1, 2, . . . , In in a specific andconsistent orders. The entries of matrices or tensors inmatricized and/or vectorized forms can be ordered in

Page 4: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

4

(a)

X X

R1 1I

R2 2I

I1 I2

R1

R2

1 2( )�I I

(b)

... Þ

Vector (each entry is a block matrix)

Block matrix

Matrix

(c)

Matrix

=

Figure 5: Hierarchical block matrices and their representa-tions as tensors: (a) a 4th-order tensor for a block matrixX ∈ RR1 I1×R2 I2 , comprising block matrices Xr1,r2 ∈ RI1×I2 , (b)a 5th-order tensor and (c) a 6th-order tensor.

at least two different ways.Remark: The multi–index can be defined using two

different conventions:1) The little-endian convention

i1, i2, . . . , iN = i1 + (i2 − 1)I1 + (i3 − 1)I1 I2

· · · +(iN − 1)I1 · · · IN−1. (1)

2) The big-endian

i1, i2, . . . , iN = iN + (iN−1 − 1)IN +

+(iN−2 − 1)IN IN−1 + · · ·+ (i1 − 1)I2 · · · IN . (2)

The little–endian notation is consistent with the For-tran style of indexing, while the big–endian notationis similar to numbers written in the positional systemand corresponds to reverse lexicographic order. The def-inition unfolding of tensors and the Kronecker (tensor)product ⊗ should be also consistent with the chosen

. . . =

4th-order tensor

=

5th-order tensors

=

6th-order tensorFigure 6: Graphical representations and symbols for higher-order block tensors. Each block represents a 3rd-order tensor or2nd-order tensor. An external circle represent a global structureof the block tensor (e.g., a vector, a matrix, a 3rd-order tensor)and inner circle represents a structure of each element of theblock tensor.

convention1. In this paper we will use the big-endian no-tation, however it is enough to remember that c = a⊗ bmeans that ci,j = aibj.

The Kronecker product of two tensors: A ∈RI1×I2···×IN and B ∈ RJ1×J2×···×JN yields C = A ⊗ B ∈RI1 I2×···×IN JN , with entries ci1,j1,...,iN ,jN

= ai1,...,iN bj1,...,jN ,where in, jn = jn + (in − 1)Jn.

The mode-n product of a tensor A ∈ RI1×···×IN bya vector b ∈ RIn is defined as a tensor C = A×nb ∈RI1×···×In−1×In+1×···×IN , with entries ci1,...,in−1,in+1,...,iN =

∑Inin=1(ai1,i2,...,iN ) bin , while a mode-n product of the

tensor A ∈ RI1×···×IN by a matrix B ∈ RJ×In is thetensor C = A ×n B ∈ RI1×···×In−1×J×In+1×···×IN , withentries ci1,i2,...,in−1,j,in+1,...,iN = ∑In

in=1 ai1,i2,...,iN bj,in . This canalso be expressed in a matrix form as C(n) = BA(n) (seeFig. 8), which allows us to employ fast matrix by vectorand matrix by matrix multiplications for very large scaleproblems.

If we take all the modes, then we have a full multilin-ear product of a tensor and a set of matrices, which iscompactly written as [4] (see Fig. 9 (a))):

C = G×1 B(1) ×2 B(2) · · · ×N B(N)

= JG; B(1), B(2), . . . , B(N)K. (3)

In a similar way to mode-n multilinear prod-uct, we can define the mode-(m

n ) product of twotensors (tensor contraction) A ∈ RI1×I2×...×IN and

1The standard and more popular definition in multilinear algebraassumes the big–endian convention, while for the development of theefficient program code for big data usually the little–endian conventionseems to be more convenient (see for more detail papers of Dolgov andSavostyanov [18], [19]).

Page 5: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

5

(a)

(b)

A

In I

n+1

I1I2IN I1

I2

In

I

In+1

IN

J� ...

��

A

...

...

1 +1( ; )n Nni i i i� �

...

A( )[ ]n =

Figure 7: Unfoldings in tensor networks: (a) Graphical rep-resentation of the basic mode-n unfolding (matricization, flat-tening) A(n) ∈ RIn×I1···In−1 In+1···IN for an Nth-order tensor A ∈RI1×I2×···×IN . (b) More general unfolding of the Nth-order ten-sor into a matrix A([n]) = A(i1,...,in ; in+1,...,iN)

∈ RI1 I2···In×In+1···IN .All entries of an unfolded tensor are arranged in a specificorder, e.g., in lexicographical order. In a more general case,let r = {m1, m2, . . . , mR} ⊂ {1, 2, . . . , N} be the row indicesand c = {n1, n2, . . . , nC} ⊂ {1, 2, . . . , N} − r be the columnindices, then the mode-(r, c) unfolding of A is denoted asA(r,c) ∈ RIm1 Im2 ···ImR×In1 In2 ···InC .

(a)

...

...

...

...

A

CB B

A(1)

C(1)

I2

I1

I3I2I1

J

J

J J

I1

2 3I I�

I1

n (n) (n)� � � �C A B C B A

I3

(b)

Figure 8: From a matrix format to the tensor network for-mat. (a) Multilinear mode-1 product of a 3rd-order tensorA ∈ RI1×I2×I3 and a factor (component) matrix B ∈ RJ×I1

yields a tensor C = A ×1 B ∈ RJ×I2×I3 . This is equivalentto simple matrix multiplication formula C(1) = BA(1). (b)Multilinear mode-n product an Nth-order tensor and a factormatrix B ∈ RJ×In .

B ∈ RJ1×J2×···×JM , with common modes In =Jm that yields an (N + M − 2)-order tensor C ∈RI1×···In−1×In+1×···×IN×J1×···Jm−1×Jm+1×···×JM :

C = A ×mn B, (4)

with entries ci1,...in−1, in+1,...iN , j1,...jm−1, jm+1,...jM =

TABLE II: Basic tensor/matrix operations.

C = A×n B

mode-n product of A ∈ RI1×I2×···×IN

and B ∈ RJn×In yieldsC ∈ RI1×···×In−1×Jn×In+1×···×IN ,with entries ci1,...,in−1, j, in+1,...iN =

∑Inin=1 ai1,...,in ,...,iN bj, in , or equivalently

C(n) = B A(n)

C = JA; B(1), . . . , B(N)K = A×1 B(1) ×2 B(2) · · · ×N B(N)

C = A ◦ Btensor or outer product of A ∈RI1×I2×···×IN and B ∈ RJ1×J2×···×JM

yields (N + M)th-order tensor C, withentries ci1,...,iN , j1,...,jM = ai1,...,iN bj1,...,jM

X = a ◦ b ◦ c ∈ RI×J×K tensor or outer product of vectorsforms a rank-1 tensor, with entriesxijk = aibjck

AT , A−1, A† transpose, inverse and Moore-Penrose pseudo-inverse of A

C = A⊗ B

Kronecker product of A ∈RI1×I2×···×IN and B ∈ RJ1×J2×···×JN

yields C ∈ RI1 J1×···×IN JN , with entriesci1,j1,...,iN ,jN

= ai1,...,iN bj1,...,jN , wherein, jn = jn + (in − 1)Jn

C = A� BKhatri-Rao product of A ∈ RI×J andB ∈ RK×J yields C ∈ RIK×J , withcolumns cj = aj ⊗ bj

∑Ini=1 ai1,...in−1, i in+1,...iN bj1,...jm−1, i, jm+1,...jM (see Fig. 10

(a)). This operation can be considered as a contractionof two tensors in single common mode. Tensors can becontracted in several modes or even in all modes asillustrated in Fig. 10.

If not confusing a super- or sub-index m, n can beneglected. For example, the multilinear product of thetensors A ∈ RI1×I2×···×IN and B ∈ RJ1×J2×···×JM , with acommon modes IN = J1 can be written as

C = A ×1N B = A×N B = A • B

∈ RI1×I2××IN−1××J2×···×JM , (5)

with entries: ci2,i3,...,iN ,j1,j3,...,jM = ∑I1i=1 ai,i2,...,iN bj1,i,j3,...,jM .

Furthermore, note that for multiplications of matricesand vectors this notation implies that A ×1

2 B = AB,A ×2

2 B = ABT , A ×1,21,2 B = 〈A, B〉, and A ×1

2 x =A×2 x = Ax.

Remark: If we use contraction for more than twotensors the order has to be specified (defined) as follows:A×b

a B×dc C = A×b

a (B×dc C) for b < c.

The outer or tensor product C = A ◦ B of the ten-sors A ∈ RI1×···×IN and B ∈ RJ1×···×JM is the tensorC ∈ RI1×···×IN×J1×···×JM , with entries ci1,...,iN ,j1,...,jM =ai1,...,iN bj1,...,jM . Specifically, the outer product of two

Page 6: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

6

(a) (b)

R1

R2

R4

B(1) R5

B(5)

B(4)

B(3)

B(2)

I1

I2I3

I4

I5

R3

G

I4

I1

I2

I3

Ab

(3)

b(2)

b(1)

Figure 9: Multilinear products via tensor network diagrams.(a) Multilinear full product of tensor (Tucker decomposition)G ∈ RR1×R2×···×R5 and factor (component) matrices B(n) ∈RIn×Rn (n = 1, 2, . . . , 5) yields the Tucker tensor decompositionC = G×1 B(1)×2 B(2) · · · ×5 B(5) ∈ RI1×I2×···×I5 . b) Multilinearproduct of tensor A ∈ RI1×I2×···×I4 and vectors bn ∈ RIn (n =

1, 2, 3) yields a vector c = A×1b(1)×2b(2)×3 b(3) ∈ RI4 .

I1

I2 J3

J4

A B

I J4 1=

I J3 2=

I4 2=J

IN

J2J1

Jm+1

JM

ABI J

n m= A B

I1

I2

(a) (b)

c( ) d( )

... ...

... ...

I1

I2 J4

J5

A B

I J5 1=

I J3 3=

I1

I1

I2

I3

Figure 10: Examples of tensor contractions: (a) Multilinearproduct of two tensors is denoted by C = A ×m

n B. (b) Innerproduct of two 3rd-order tensors yields a scalar c = 〈A, B〉 =A ×1,2,3

1,2,3 B = A × B = ∑i1,i2,i3ai1,i2,i3 bi1,i2,i3 . (c) Tensor

contraction of two 4th-order tensors yields C = A ×1,24,3 B ∈

RI1×I2×J3×J4 , with entries ci1,i2,j3,j4 = ∑i3,i4ai1,i2,i3,i4 bi4,i3,j3,j4 .

(d) Tensor contraction of two 5th-order tensors yields 4th-order tensor C = A ×1,2,3

3,4,5 B ∈ RI1×I2×J4×J5 , with entriesci1,i2,j4,j5 = ∑i3,i4,i5

ai1,i2,i3,i4,i5 bi5,i4,i3,j4,j5 .

nonzero vectors a ∈ RI , b ∈ RJ produces a rank-1 matrix X = a ◦ b = abT ∈ RI×J and the outerproduct of three nonzero vectors: a ∈ RI , b ∈ RJ

and c ∈ RK produces a 3rd-order rank-1 tensor: X =a ◦ b ◦ c ∈ RI×J×K, whose entries are xijk = ai bj ck. Atensor X ∈ RI1×I2×···×IN is said to be rank-1 if it can beexpressed exactly as X = b1 ◦ b2 ◦ · · · ◦ bN , with entriesxi1,i2,...,iN = bi1 bi2 · · · biN , where bn ∈ RIn are nonzerovectors. We refer to [1], [4] for more detail regarding thebasic notations and tensor operations.

I1 I

2 I3 I

4I5

I6

X �

I1 I

2I3

I4

I5

I6

I7

I8

I9

I7

I8

I9

TT=MPS

PEPS

HT

I1 I

2I3 I

4I5

I6

I7

I8

I9

I1

I2 I

3

I4

I5

I6

I7

I8

I9

Figure 11: Examples of tensor networks. Illustration of rep-resentation of 9th-order tensor X ∈ RI1×I2×···×I9 by dif-ferent kinds of tensor networks (TNs): Tensor Train (TT)which is equivalent to the Matrix Product State (MPS) (withopen boundary conditions (OBC)), the Projected Entangled-Pair State (PEPS), called also Tensor Product States (TPS, andHierarchical Tucker (HT) decomposition, which is equivalentto the Tree-Tensor Network State (TTNS). The objective is todecompose a high-order tensor into sparsely connected low-order and low-rank tensors, typically 3rd-order and/or 4th-order tensors, called cores.

III. Tensor Networks

A tensor network aims to represent or decomposea higher-order tensor into a set of lower-order tensors(typically, 2nd (matrices) and 3rd-order tensors calledcores or components) which are sparsely interconnected.In other words, in contrast to TDs, TNs represent de-compositions of the data tensors into a set of sparsely(weakly) interconnected lower-order tensors. Recently,the curse of dimensionality for higher-order tensors hasbeen considerably alleviated or even completely avoidedthrough the concept of tensor networks (TN) [20], [21]. ATN can be represented by a set of nodes interconnectedby lines. The lines (leads, branches, edges) connectingtensors between each other correspond to contractedmodes, whereas lines that do not go from one tensorto another correspond to open (physical) modes in theTN (see Fig. 11).

An edge connecting two nodes indicates a contractionof the respective tensors in the associated pair of modesas illustrated in Fig. 10. Each free (dangling) edge cor-responds to a mode, that is not contracted and, hence,the order of the entire tensor network is given by thenumber of free edges (called often physical indices).A tensor network may not contain any loops, i.e., anyedges connecting a node with itself. Some examples oftensor network diagrams are given in Fig. 11.

If a tensor network is a tree, i.e., it does not contain anycycle, each of its edges splits the modes of the data tensorinto two groups, which is related to the suitable matri-cization of the tensor. If, in such a tree tensor network,all nodes have degree 3 or less, it corresponds to anHierarchical Tucker (HT) decomposition shown in Fig.12 (a). The HT decomposition has been first introduced

Page 7: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

7

in scientific computing by Hackbusch and Kuhn andfurther developed by Grasedyck, Kressner, Tobler andothers [7], [22]–[26]. Note that for 6th-order tensor, thereare two such tensor networks (see Fig. 12 (b)), and for10th-order there are 11 possible HT decompositions [24],[25].

A simple approach to reduce the size of core tensorsis to apply distributed tensor networks (DTNs), whichconsists in two kinds of cores (nodes): Internal cores(nodes) which have no free edges and external coreswhich have free edges representing physical indices ofa data tensor as illustrated in Figs. 12 and 13.

The idea in the case of the Tucker model, is that a coretensor is replaced by distributed sparsely interconnectedcores of lower-order, resulting in a Hierarchical Tucker(HT) network in which only some cores are connected(associated) directly with factor matrices [7], [22], [23],[26].

For some very high-order data tensors it has beenobserved that the ranks Rn (internal dimensions of cores)increase rapidly with the order of the tensor and/orwith an increasing accuracy of approximation for anychoice of tensor network, that is, a tree (including TTand HT decompositions) [25]. For such cases, the Pro-jected Entangled-Pair State (PEPS) or the Multi-scaleEntanglement Renormalization Ansatz (MERA) tensornetworks can be used. These contain cycles, but havehierarchical structures (see Fig. 13). For the PEPS andMERA TNs the ranks can be kept considerably smaller,at the cost of employing 5th and 4th-order cores andconsequently a higher computational complexity w.r.t.tensor contractions. The main advantage of PEPS andMERA is that the size of each core tensor in the internaltensor network structure is usually much smaller thanthe cores in TT/HT decompositions, so consequently thetotal number of parameters can be reduced. However,it should be noted that the contraction of the resultingtensor network becomes more difficult when comparedto the basic tree structures represented by TT and HTmodels. This is due to the fact that the PEPS and MERAtensor networks contain loops.

IV. Basic Tensor Decompositions and theirRepresentation via Tensor Networks Diagrams

The main objective of a standard tensor decompositionis to factorize a data tensor into physically interpretableor meaningful factor matrices and a single core tensorwhich indicates the links between components (vectorsof factor matrices) in different modes.

A. Constrained Matrix Factorizations and Decomposi-tions – Two-Way Component Analysis

Two-way Component Analysis (2-way CA) exploits apriori knowledge about different characteristics, featuresor morphology of components (or source signals) [1],[27] to find the hidden components thorough constrained

(a)

(b)

Order 3: Order 4:

Order 5:

Order 6:

Order 7:

Order 8:

Figure 12: (a) The standard Tucker decomposition and itstransformation into Hierarchical Tucker (HT) model for an 8th-order tensor using interconnected 3rd-order core tensors. (b)Various exemplary structure HT/TT models for different orderof data tensors. Green circles indicate factor matrices while redcircles indicate cores.

Page 8: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

8

I8

I1

I2

I3

I4

I5

I6

I7

Þ

I1

I2

I3

I4

I5

I6

I7

I8

Figure 13: Alternative distributed representation of 8th-orderTucker decomposition where a core tensor is replaced byMERA (Multi-scale Entanglement Renormalization Ansatz)tensor network which employs 3rd-order and 4th-order coretensors. For some data-sets, the advantage of such model isrelatively low size (dimensions) of the distributed cores.

matrix factorizations of the form

X = ABT + E =R

∑r=1

ar ◦ br + E =R

∑r=1

arbTr + E, (6)

where the constraints imposed on factor matrices Aand/or B include orthogonality, sparsity, statistical inde-pendence, nonnegativity or smoothness. The CA can beconsidered as a bilinear (2-way) factorization, where X ∈RI×J is a known matrix of observed data, E ∈ RI×J rep-resents residuals or noise, A = [a1, a2, . . . , aR] ∈ RI×R

is the unknown (usually, full column rank R) mixingmatrix with R basis vectors ar ∈ RI , and B = [b1, b2,. . . , bR] ∈ RJ×R is the matrix of unknown components(factors, latent variables, sources).

Two-way component analysis (CA) refers to a classof signal processing techniques that decompose or en-code superimposed or mixed signals into componentswith certain constraints or properties. The CA meth-ods exploit a priori knowledge about the true natureor diversities of latent variables. By diversity, we referto different characteristics, features or morphology ofsources or hidden latent variables [27].

For example, the columns of the matrix B that rep-resent different data sources should be: as statisticallyindependent as possible for ICA; as sparse as possiblefor SCA; take only nonnegative values for (NMF) [1],[16], [27].

Remark: Note that matrix factorizations have an in-herent symmetry, Eq. (6) could be written as XT ≈ BAT ,thus interchanging the roles of sources and mixing pro-cess.

Singular value decomposition (SVD) of the data matrixX ∈ RI×J is a special case of the factorization in Eq. (6).It is exact and provides an explicit notion of the rangeand null space of the matrix X (key issues in low-rankapproximation), and is given by

X = UΣVT =R

∑r=1

σr urvTr =

R

∑r=1

σr ur ◦ vr, (7)

where U and V are column-wise orthonormal matricesand Σ is a diagonal matrix containing only nonnegativesingular values σr.

Another virtue of component analysis comes from arepresentation of multiple-subject, multiple-task datasetsby a set of data matrices Xk, allowing us to performsimultaneous matrix factorizations:

Xk ≈ AkBTk , (k = 1, 2, . . . , K), (8)

subject to various constraints. In the case of statisticalindependence constraints, the problem can be related tomodels of group ICA through suitable pre-processing,dimensionality reduction and post-processing proce-dures [28].

The field of CA is maturing and has generated efficientalgorithms for 2-way component analysis (especially,for sparse/functional PCA/SVD, ICA, NMF and SCA)[1], [16], [29]. The rapidly emerging field of tensor de-compositions is the next important step that naturallygeneralizes 2-way CA/BSS algorithms and paradigms.We proceed to show how constrained matrix factor-izations and component analysis (CA) models can benaturally generalized to multilinear models using con-strained tensor decompositions, such as the CanonicalPolyadic Decomposition (CPD) and Tucker models, asillustrated in Figs. 14 and 15.

B. The Canonical Polyadic Decomposition (CPD)

The CPD (called also PARAFAC or CANDECOMP)factorizes an Nth-order tensor X ∈ RI1×I2×···×IN into alinear combination of terms b(1)

r ◦ b(2)r ◦ · · · ◦ b(N)

r , whichare rank-1 tensors, and is given by [30]–[32]

X ∼=R

∑r=1

λr b(1)r ◦ b(2)

r ◦ · · · ◦ b(N)r

= Λ×1 B(1) ×2 B(2) · · · ×N B(N)

= JΛ; B(1), B(2), . . . , B(N)K,

(9)

where the only non-zero entries λr of the diagonalcore tensor G = Λ ∈ RR×R×···×R are located on themain diagonal (see Fig. 14 for a 3rd-order and 4th-ordertensors).

Via the Khatri-Rao products the CPD can also beexpressed in a matrix/vector form as:

X(n)∼= B(n)Λ(B(1)� · · · �B(n−1)�B(n+1)� · · · �B(N))T

vec(X) ∼= [B(1) � B(2) � · · · � B(N)] λ, (10)

where B(n) = [b(n)1 , b(n)

2 , . . . , b(n)R ] ∈ RIn×R, λ =

[λ1, λ2, . . . , λR]T and Λ = diag{λ} is a diagonal matrix.

The rank of tensor X is defined as the smallest R forwhich CPD (9) holds exactly.

Algorithms to compute CPD. In the presence of noisein real world applications the CPD is rarely exact and hasto be estimated by minimizing a suitable cost function,typically of the Least-Squares (LS) type in the formof the Frobenius norm ||X − JΛ; B(1), B(2), . . . , B(N)K||F,or using Least Absolute Error (LAE) criteria [33]. TheAlternating Least Squares (ALS) algorithms [1], [13],

Page 9: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

9

(a) Standard block diagram

�X B(1)

J

I

K

K

G C

+ +. . .

( )�I R ( )� �R R R ( )�R J

( )� �R R K( )�I R ( )�R J

1λ λ

R

B(3)Λ

Λ B(2)T

B(3)

B(1)

B(2)T

(3)1

b

(2)1

b

(1)1

b

(3)R

b

(2)R

b

(1)R

b

( )�K R

(b) CPD in tensor network notation

I1

I4

I3I2

X

=I1

I2

I3

I4

RR

R

R

B(1)

B(2)

B(3)

B(4)

Λ~

Figure 14: Representation of the CPD. (a) The CanonicalPolyadic Decomposition (CPD) of a 3rd-order tensor as: X ∼=Λ×1 B(1) ×2 B(2) ×3 B(3) = ∑R

r=1 λr b(1)r ◦ b(2)r ◦ b(3)r = Gc ×1B(1) ×2 B(2) with G = Λ and Gc = Λ×3 B(3). (b) The CPD fora 4th-order tensor as: X ∼= Λ×1 B(1) ×2 B(2) ×3 B(3) ×4 B(4) =

∑Rr=1 b(1)r ◦ b(2)r ◦ b(3)r ◦ b(4)r . The objective of the CPD is to

estimate the factor matrices B(n) and a rank of tensor R, thatis, the number of components R.

[31], [34] minimize the LS cost function by optimizingindividually each component matrix, while keeping theother component matrices fixed. For instance, assumethat the diagonal matrix Λ has been absorbed in oneof the component matrices; then, by taking advantageof the Khatri-Rao structure the component matrices B(n)

can be updated sequentially as [4]

B(n) ← X(n)

⊙k 6=n

B(k)

(~k 6=n(B(k) TB(k))

)†, (11)

which requires the computation of the pseudo-inverse ofsmall (R× R) matrices.

The ALS is attractive for its simplicity and for welldefined problems (not too many, well separated, not

collinear components) and high SNR, the performance ofALS algorithms is often satisfactory. For ill-conditionedproblems, more advanced algorithms exist, which typ-ically exploit the rank-1 structure of the terms withinCPD to perform efficient computation and storage of theJacobian and Hessian of the cost function [35], [36].

Constraints. The CPD is usually unique by itself,and does not require constraints to impose uniqueness[37]. However, if components in one or more modesare known to be e.g., nonnegative, orthogonal, statis-tically independent or sparse, these constraints shouldbe incorporated to relax uniqueness conditions. Moreimportantly, constraints may increase the accuracy andstability of the CPD algorithms and facilitate betterphysical interpretability of components [38], [39].

C. The Tucker DecompositionThe Tucker decomposition can be expressed as follows

[40]:

X ∼=R1

∑r1=1· · ·

RN

∑rN=1

gr1r2···rN

(b(1)

r1 ◦ b(2)r2 ◦ · · · ◦ b(N)

rN

)= G×1 B(1) ×2 B(2) · · · ×N B(N)

= JG; B(1), B(2), . . . , B(N)K. (12)

where X ∈ RI1×I2×···×IN is the given data tensor,G ∈ RR1×R2×···×RN is the core tensor and B(n) =

[b(n)1 , b(n)

2 , . . . , b(n)Rn

] ∈ RIn×Rn are the mode-n componentmatrices, n = 1, 2, . . . , N (see Fig. 15).

Using Kronecker products the decomposition in (12)can be expressed in a matrix and vector form as follows:

X(n)∼= B(n)G(n)(B

(1) · · · ⊗ B(n−1) ⊗ B(n+1) · · · ⊗ B(N))T

vec(X) ∼= [B(1) ⊗ B(2) · · · ⊗ B(N)]vec(G). (13)

The core tensor (typically, Rn < In) models a poten-tially complex pattern of mutual interaction between thevectors (components) in different modes.

Multilinear rank. The N-tuple (R1, R2, . . . , RN) iscalled the multilinear-rank of X, if the Tucker decom-position holds exactly.

Note that the CPD can be considered as a specialcase of the Tucker decomposition, in which the coretensor has nonzero elements only on main diagonal.In contrast to the CPD the Tucker decomposition, ingeneral, is non unique. However, constraints imposedon all factor matrices and/or core tensor can reduce theindeterminacies to only column-wise permutation andscaling [41].

In Tucker model some selected factor matrices canbe identity matrices, this leads to Tucker-(K, N) model,which is graphically illustrated in Fig. 16 (a). In a suchmodel (N − K) factor matrices are equal to identitymatrices. In the simplest scenario for 3rd-order tensorX ∈ RI1×I2×I3 the Tucker-(2,3) model, called simplyTucker-2, can be described as

X ∼= G×1 B(1) ×2 B(2). (14)

Page 10: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

10

(a) Standard block diagram of TD

� B(1)X

J

I

K

+ +. . .(

(

=

c1

b1

a1

cR

bR

aR

1( )�I R

1 2 3( )R R R� �

3( )�K R

2( )�R J

G

R3

R1

R2

B(2)T

B(3)

B(1)

B(3)

B(2)T

(b) TD in tensor network notations

R1

R2

R3

R4

I1

I2

I3

I4

B(1)

RR

R

RI1

I2

I3

I4

ΛG

R1

R4

R3

R2B(2)

B(3)

B(4)

B(2)

B(4)

B(3)

B(1)

A(1)

A(2)

A(3)

A(4)

Figure 15: Representation of the Tucker Decomposition (TD).(a) TD of a 3rd-order tensor X ∼= G×1 B(1)×2 B(2)×3 B(3). Theobjective is to compute factor matrices B(n) and core tensor G.In some applications, in the second stage, the core tensor is ap-proximately factorized using the CPD as G ∼= ∑R

r=1 ar ◦ br ◦ cr.(b) Graphical representation of the Tucker and CP decom-positions in two-stage procedure for a 4th-order tensor as:X ∼= G×1 B(1) ×2 B(2) · · · ×4 B(4) = JG; B(1), B(2), B(3), B(4)K ∼=(Λ ×1 A(1) ×2 A(2) · · · ×4 A(4)) ×1 B(1) ×2 B(2) · · · ×4 B(4) =JΛ; B(1)A(1), B(2)A(2), B(3)A(3), B(4)A(4)K.

Similarly, we can define PARALIND/CONFAC-(K, N)models2described as [42]

X ∼= G×1 B(1) ×2 B(2) · · · ×N B(N) (15)= JI; B(1)Φ(1), . . . , B(K)Φ(K), B(K+1) . . . , B(N)K,

where the core tensor, called constrained tensor or inter-action tensor, is expressed as

G = I×1 Φ(1) ×2 Φ(2) · · · ×K Φ(K), (16)

with K ≤ N. The factor matrices Φ(n) ∈ RRn×R, withR ≥ max(Rn) are constrained matrices, called ofteninteraction matrices (see Fig. 16 (b).

Another important, more complex constrained CPDmodel, which can be represented graphically as nested

2PARALIND is abbreviation of PARAllel with LINear Dependencies,while CONFAC means CONstrained FACtor model (for more detail see[42] and references therein.)

(a)

(b)

B(1)

B(2)

R1

I2

IN

R

G

R2

I1

(1)Φ

R

R RK

B( )K

IK

(2)Φ

( )Φ KI

R

B( )N

...

...

R

IK+1

B( +1)K

Figure 16: Graphical illustration of constrained Tucker andCPD models: (a) Tucker-(K, N) decomposition of a Nth-order tensor, with N ≥ K, X ∼= G ×1 B(1) ×2 B(2) · · · ×KB(K) ×K+1 I ×K+2 · · · ×N I = JG; B(1), B(2), . . . , B(K)K, (b)Constrained CPD model, called PARALIND/CONFAC-(K, N) X ∼= G ×1 B(1) ×2 B(2) · · · ×N B(N) =JI; B(1)Φ(1), B(2)Φ(2), . . . , B(K)Φ(K), B(K+1), . . . , B(N)K, wherecore tensor G = I×1 Φ(1) ×2 Φ(2) · · · ×K Φ(K) with K ≤ N.

Tucker-(K, N) model is the PARATUCK-(K, N) model(see review paper of Favier and de Almeida [42] andreferences therein).

D. Multiway Component Analysis Using ConstrainedTucker Decompositions

A great success of 2-way component analysis (PCA,ICA, NMF, SCA) is largely due to the various constraintswe can impose. Without constraints matrix factorizationloses its most sense as the components are rather ar-bitrary and they do not have any physical meaning.There are various constraints that lead to all kinds ofcomponent analysis methods which are able give uniquecomponents with some desired physical meaning andproperties and hence serve for different application pur-poses. Just similar to matrix factorization, unconstrainedTucker decompositions generally can only be served asmultiway data compression as their results lack physicalmeaning. In the most practical applications we need toconsider constrained Tucker decompositions which can

Page 11: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

11

provide multiple sets of essential unique componentswith desired physical interpretation and meaning. Thisis direct extension of 2-way component analysis and isreferred to as multiway component analysis (MWCA)[2].

The MWCA based on Tucker-N model can be consid-ered as a natural and simple extension of multilinearSVD and/or multilinear ICA, in which we apply any ef-ficient CA/BSS algorithms to each mode, which providesessential uniqueness [41].

There are two different models to interpret and imple-ment constrained Tucker decompositions for MWCA. (1)the columns of the component matrices B(n) representthe desired latent variables, the core tensor G has arole of “mixing process”, modeling the links amongthe components from different modes, while the datatensor X represents a collection of 1-D or 2-D mixingsignals; (2) the core tensor represents the desired (buthidden) N-dimensional signal (e.g., 3D MRI image or 4Dvideo), while the component matrices represent mixingor filtering processes through e.g., time-frequency trans-formations or wavelet dictionaries [3].

The MWCA based on the Tucker-N model can becomputed directly in two steps: (1) for n = 1, 2, . . . , Nperform model reduction and unfolding of data tensorssequentially and apply a suitable set of CA/BSS algo-rithms to reduced unfolding matrices X(n), - in eachmode we can apply different constraints and algorithms;(2) compute the core tensor using e.g., the inversionformula: G = X×1 B(1)† ×2 B(2)† · · · ×N B(N)† [41]. Thisstep is quite important because core tensors illuminatecomplex links among the multiple components in differ-ent modes [1].

V. Block-wise Tensor Decompositions for VeryLarge-Scale Data

Large-scale tensors cannot be processed by commonlyused computers, since not only their size exceeds avail-able working memory but also processing of huge datais very slow. The basic idea is to perform partition of abig data tensor into smaller blocks and then performtensor related operations block-wise using a suitabletensor format (see Fig. 17). A data management systemthat divides the data tensor into blocks is importantapproach to both process and to save large datasets.The method is based on a decomposition of the originaltensor dataset into small block tensors, which are ap-proximated via TDs. Each block is approximated usinglow-rank reduced tensor decomposition, e.g., CPD or aTucker decomposition.

There are three important steps for such approachbefore we would be able to generate an output: First,an effective tensor representation should be chosen forthe input dataset; second, the resulting tensor needs tobe partitioned into sufficiently small blocks stored ona distributed memory system, so that each block canfit into the main memory of a single machine; third, a

(a)

@

I3

U(1)

1 1( )I R´

2 2( )R I´

3 3( )I R´

P2k

Xk

X

G

1 2 3( )R R R´ ´

I1

I2

P2k

(2)TkU

U(2)

U(2)T

R3

R1

R2

(b)

@

I3

P1k

P U(1)

1 2 3( )I I I´ ´

1 1( )I R´

2 2( )R I´

2 2( )R P´

k

3 3( )I R´

P2k

Xk

X

G

1 2 3( )R R R´ ´

1

(1)k

I1

I2

P1k

3k

P2k

U

2

(2)TkU

3

(3)kU

P3k

U(3)

U(2)T

R3

R1

R2

Figure 17: Conceptual models for performing the Tucker de-composition (HOSVD) for large-scale 3rd-order data tensorsby dividing the tensors into blocks (a) along one largestdimension mode, with blocks Xk

∼= G×1 U(1) ×2 U(2)k ×3 U(3),

(k = 1, 2 . . . , K), and (b) along all modes with blocks Xk∼=

G ×1 U(1)k1×2 U(2)

k2×3 U(3)

k3. The models can be used for an

anomaly detection by fixing a core tensor and some factormatrices and by monitoring the changes along one or morespecific modes. First, we compute tensor decompositions forsampled (pre-selected) small blocks and in the next step weanalyze changes in specific factor matrices U(n).

suitable algorithm for TD needs to be adapted so that itcan take the blocks of the original tensor, and still outputthe correct approximation as if the tensor for the originaldataset had not been partitioned [8]–[10].

Converting the input data tensors from its original for-mat into this block-structured tensor format is straight-forward, and needs to be performed as a preprocessingstep. The resulting blocks should be saved into separatefiles on hard disks to allow efficient random or sequen-tial access to all of blocks, which is required by most TDand TN algorithms.

We have successfully applied such techniques to CPD[10]. Experimental results indicate that our algorithmscannot only process out-of-core data, but also achievehigh computation speed and good performance.

VI. Multilinear SVD (MLSVD) for Large ScaleProblems

MultiLinear Singular Value Decomposition (MLSVD),called also higher-order SVD (HOSVD) can be consid-ered as a special form of the Tucker decomposition [43],[44], in which all factor matrices B(n) = U(n) ∈ RIn×Im

are orthogonal and the core tensor G = S ∈ RI1×I2×···×IN

is all-orthogonal (see Fig. 18).We say that the core tensor is all-orthogonal if it

satisfies the following conditions:

Page 12: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

12

(a)

U(1)

1 1( )I R� 2 2

( )I R�

3 3( )I R�

X � U(2)

I1

I2

I3

I1

R1

R2

R3

I3

I2

I2I1

1 2 3( )I I I� �

S

R3

St

U(3)

R2R1

1 2 3( )I I I� �

(b)

I1

I4I3

I2

X

R1R2

R3

R4

I1

I2

I3

I4

StU

(1)

U(2)

U(3)

U(4)

@

Figure 18: Graphical illustration of the HOSVD. (a) The exactHOSVD and truncated (approximative) HOSVD a for 3rd-ordertensor as: X ∼= St ×1 U(1) ×2 U(2) ×3 U(3) using a truncatedSVD. (b) Tensor network notation for the HOSVD for a 4th-order tensor X ∼= St ×1 U(1) ×2 U(2) ×3 U(3) ×4 U(4). All factormatrices U(n) and the core tensor St are orthogonal; due toorthogonality of the core tensor the HOSVD is unique for aspecific multilinear rank.

(1) All orthogonality: Slices in each mode are mutuallyorthogonal, e.g., for a 3rd-order tensor

〈S:,k,:S:,l,:〉 = 0, for k 6= l, (17)

(2) Pseudo-diagonality: Frobenius norms of slices ineach mode are decreasing with the increase of the run-ning index

||S:,k,:||F ≥ ||S:,l,:||F, k ≥ l. (18)

These norms play a role similar to that of the singularvalues in the matrix SVD.

The orthogonal matrices U(n) can be in practicecomputed by the standard SVD or truncated SVD ofunfolded mode-n matrices X(n) = U(n)ΣnV(n)T ∈RIn×I1···In−1 In+1···IN . After obtaining the orthogonal ma-trices U(n) of left singular vectors of X(n), for each n, wecan compute the core tensor G = S as

S = X×1 U(1) T ×2 U(2) T · · · ×N U(N) T , (19)

such that

X = S×1 U(1) ×2 U(2) · · · ×N U(N). (20)

Due to orthogonality of the core tensor S its slices aremutually orthogonal, this reduces to the diagonality inthe matrix case.

In some applications we may use a modified HOSVDin which the SVD can be performed not on the un-folding mode-n matrices X(n) but on their transposes,i.e., XT

(n)∼= V(n)ΣnU(n)T . This leads to the modified

HOSVD corresponding to Grassmann manifolds [45],that requires the computation of very large (tall-and-skinny) factor orthogonal matrices V(n) ∈ RIn×In , whereIn = ∏k 6=n Ik = I1 · · · In−1 In+1 · · · IN , and the core tensorS ∈ RI1×I2×···×IN based on the following model:

X = S×1 V(1) T ×2 V(2) T · · · ×N V(N) T . (21)

In practical applications the dimensions of unfold-ing matrices X(n) ∈ RIn×In may be prohibitively large(with In � In), easily exceeding memory of standardcomputers. A truncated SVD of a large-scale unfoldingmatrix X(n) = U(n)ΣnV(n)T is performed by partition-ing it into Q slices, as X(n) = [X1,n, X2,n, . . . , XQ,n] =

U(n)Σn[VT1,n, VT

2,n, . . . , VTQ,n]. Next, the orthogonal matri-

ces U(n) and the diagonal matrices Σn are obtained fromeigenvalue decompositions X(n)XT

(n) = U(n)Σ2nU(n)T =

∑q Xq,nXTq,n ∈ RIn×In , allowing for the terms Vq,n =

XTq,nU(n)Σ−1

n to be computed separately. This allows us tooptimize the size of the q-th slice Xq,n ∈ RIn×(In/Q) so asto match the available computer memory. Such a simpleapproach to compute matrices U(n) and/or V(n) does notrequire loading the entire unfolding matrices at once intocomputer memory, instead the access to the dataset issequential. Depending on the size of computer memory,the dimension In is typically less than 10,000, whilethere is no limit on the dimension In = ∏k 6=n Ik. Moresophisticated approaches which also exploit partitionof matrices or tensors into blocks for QR/SVD, PCA,NMF/NTF and ICA can be found in [10], [29], [46]–[48].

When a data tensor X is very large and cannot bestored in computer memory, then another challenge isto compute a core tensor G = S by directly using theformula:

G = X×1 U(1)T ×2 U(2)T · · · ×n U(n)T , (22)

which is generally performed sequentially as illustratedin Fig. 19 (a) and (b) [8], [9].

For very large tensors it is useful to divide the datatensor X into small blocks X[k1,k2,...,kN ] and in order tostore them on hard disks or distributed memory. Insimilar way, we can divide the orthogonal factor matricesU(n)T into corresponding blocks of matrices U(n)T

[kn ,pn ]as

illustrated in Fig. 19 (c) for 3rd-order tensors [9]. In ageneral case, we can compute blocks within the resulting

Page 13: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

13

(a) Sequential computation

I1

I2

I3I1I2

R1

R1R2

I2

I3

I2 I3

R1

R2

I3

R1

R3

I3R1

R2

R3

X

... ... ...

G(1)

G(2)

U(1)T

G(1)

U(2)T

G(2)

GU(3)T

(1)T (2)T (3)T1 2 3

(( ) )= ´ ´ ´G X U U U

I3

... ... ...

R1 R2

(b) Distributed computation

...

...

R1

I1

I1

R1

2 3I I×

X(1)ÛI1

I2

I3I1I2

R1

R1

X

...

U(1)T

G(1)

I3

... U(1)T

G(1)

(1)

(c) Divide-and-conquer approachX

U11 U21 U31

U12 U22 U321

´=

U(1)T Z G=

(1)

1 2 3( )´ ´I I I

1 1( )´R I

1 2 3( )´ ´R I I

X[1,1,1] X[1,2,1]

X[2,1,1] X[2,2,1]

X[3,2,1]X[3,1,1]

X[1,1,2]X[1,2,2]

Z[1,1,1] Z[1,2,1]

Z[2,1,1] Z[2,2,1]

Z[1,1,2] Z[1,2,2]

z[

]

2,2,2

Figure 19: Computation of a core tensor for a large-scaleHOSVD: (a) using sequential computing of multilinear prod-ucts G = S = (((X ×1 U(1)T) ×2 U(2)T) ×3 U(3)T), and (b)by applying fast and distributed implementation of matrix bymatrix multiplications; (c) alternative method for very large-scale problems by applying divide and conquer approach, inwhich a data tensor X and factor matrices U(n)T are par-titioned into suitable small blocks: Subtensors X[k1,k2,k3] and

blocks matrices U(1)T[k1,p1]

, respectively. We compute the blocks

of tensor Z = G(1) = X ×1 U(1)T as follows Z[q1,k2,k3] =

∑K1k1=1 X[k1,k2,k3] ×1 U(1)T

[k1,q1](see Eq. (23) for a general case.)

tensor G(n) sequentially or in parallel way as follows:

G(n)[k1,k2,...,qn ,...,kN ]

=Kn

∑kn=1

X[k1,k2,...,kn ,...,kN ] ×n U(n) T[kn ,qn ]

. (23)

If a data tensor has low-multilinear rank, so that itsmultilinear rank {R1, R2, . . . , RN} with Rn � In, ∀n, wecan further alleviate the problem of dimensionality byfirst identifying a subtensor W ∈ RP1×P2×···PN for whichPn ≥ Rn, using efficient CUR tensor decompositions [49].Then the HOSVD can be computed from subtensors asillustrated in Fig. 20 for a 3rd-order tensor. This featurecan be formulated in more general form as the followingProposition.

Proposition 1: If a tensor X ∈ RI1×I2×···×IN

has low multilinear rank {R1, R2, . . . , RN}, with

Figure 20: Alternative approach to computation of the HOSVDfor very large data tensor, by exploiting multilinear low-rankapproximation. The objective is to select such fibers (up topermutation of fibers) that the subtensor W ∈ RP1×P2×P3 withPn ≥ Rn (n = 1, 2, 3) has the same multilinear rank {R1, R2, R3}as the whole huge data tensor X, with Rn � In. Instead ofunfolding of the whole data tensor X we need to performunfolding (and applying the standard SVD) for typically muchsmaller subtensors X(1) = C ∈ RI1×P2×P3 , X(2) = R ∈RP1×I2×P3 , X(3) = T ∈ RP1×P2×I3 , each in a single mode-n, (n = 1, 2, 3). This approach can be applied if data tensoradmits low multilinear rank approximation. For simplicity ofillustration, we assumed that fibers are permuted in a suchway that the first P− 1, P2, P3 fibers were selected.

Rn ≤ In, ∀n, then it can be fully reconstructedvia the HOSVD using only N subtensorsX(n) ∈ RP1×···×Pn−1×In×Pn+1×···×PN , (n = 1, 2, . . . , N),under the condition that subtensor W ∈ RP1×P2×···×PN ,with Pn ≥ Rn, ∀n has the multilinear rank{R1, R2, . . . , RN}.

In practice, we can compute the HOSVD for low-rank, large-scale data tensors in several steps. In the firststep, we can apply the CUR FSTD decomposition [49] toidentify close to optimal a subtensor W ∈ RR1×R2×···×RN

(see the next Section), In the next step, we can usethe standard SVD for unfolding matrices X(n)

(n) of sub-

tensors X(n) to compute the left orthogonal matricesU(n) ∈ RIn×Rn . Hence, we compute an auxiliary coretensor G = W×1 B(1) · · · ×N B(N), where B(n) ∈ RRn×Rn

are inverses of the sub-matrices consisting the first Rnrows of the matrices U(n). In the last step, we performHOSVD decomposition of the relatively small core ten-sor as G = S×1 Q(1) · · · ×N Q(N), with Q(n) ∈ RRn×Rn

and then desired orthogonal matrices are computed asU(n) = U(n)Q(n).

VII. CUR Tucker Decomposition for DimensionalityReduction and Compression of Tensor Data

Note that instead of using the full tensor, we maycompute an approximative tensor decomposition modelfrom a limited number of entries (e.g., selected fibers,slices or subtensors). Such completion-type strategieshave been developed for low-rank and low-multilinear-rank [50], [51]. A simple approach would be to apply

Page 14: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

14

X C U R

J

I

( )�I C ( )�R J( )�C R

Figure 21: CUR decomposition for a huge matrix.

CUR decomposition or Cross-Approximation by sam-pled fibers for the columns of factor matrices in a Tuckerapproximation [49], [52]. Another approach is to applytensor networks to represent big data by high-order ten-sors not explicitly but in compressed tensor formats (seenext sections). Dimensionality reduction methods arebased on the fundamental assumption that large datasetsare highly redundant and can be approximated by low-rank matrices and cores, allowing for a significant re-duction in computational complexity and to discovermeaningful components while exhibiting marginal lossof information.

For very large-scale matrices, the so called CUR matrixdecompositions can be employed for dimensionality re-duction [49], [52]–[55]. Assuming a sufficiently preciselow-rank approximation, which implies that data hassome internal structure or smoothness, the idea is toprovide data representation through a linear combina-tion of a few “meaningful” components, which are exactreplicas of columns and rows of the original data matrix[56].

The CUR model, also called skeleton Cross-Approximation, decomposes a data matrix X ∈ RI×J as[53], [54] (see Fig. 21):

X = CUR + E, (24)

where C ∈ RI×C is a matrix constructed from C suitablyselected columns of the data matrix X, R ∈ RR×J consistsof R rows of X, and the matrix U ∈ RC×R is chosen tominimize the norm of the error E ∈ RI×J . Since typically,C � J and R � I, these columns and rows are chosenso as to exhibit high “statistical leverage” and providethe best low-rank fit to the data matrix, at the same timethe error cost function ||E||2F is minimized. For a givenset of columns (C) and rows (R), the optimal choice forthe core matrix is U = C†X(R†)T . This requires accessto all the entries of X and is not practical or feasible forlarge-scale data. A pragmatic choice for the core matrixwould be U = W†, where the matrix W ∈ RR×C isdefined from the intersections of the selected rows andcolumns. It should be noted that, if rank(X) ≤ C, R, thenthe CUR approximation is exact. For the general case, ithas been proven that, when the intersection sub-matrixW is of maximum volume (the volume of a sub-matrixW is defined as |det(W)|), this approximation is closeto the optimal SVD solution [54].

(a)X

I3

I1

I2

C R

U

1 2 3( )´I P P

1 3 2( )´P P I

2 3 1 3 1 2( )´ ´P P P P P P

...

...

@

3 1 2( )´I P P

T

W R

C

P3

P1

P2

R

...T

(b)

1 1( )I P�

2 2( )P I�

1 2 3( )P P P� �

3 3( )�I P

B(1)

B(2)T

B(3)

=

1 2 3( )� �I I I

X

I3

I1

I2

T

W R

C

R

W

P3

P1

P2

(c)

XI1

I2

I3

UP P2 3

I2

P P1 3

R

C T I3I1

Þ

W

P P2 3

P2

CT

I1

P3

P P1 2

B(3)

I3

W(1)

+W(3)

+

P P1 3

R

W(2)

+

P1

B(2)

B(1)

W

I2

I3I1

Ü

B(1)

B(3)

B(2)

ß ß

P P1 2

P1

P2

P3

I2

Figure 22: (a) CUR decomposition of a large 3rd-order tensor(for simplicity of illustration up to permutation of fibers)X ∼= U ×1 C ×2 R ×3 T = JU; C, R, TK, where U = W ×1W†

(1) ×2 W†(2) ×3 W†

(3) = JW; W†(1), W†

(2), W†(3)K. (b) equivalent

decomposition expressed via subtensor W, (c) Tensor net-work diagram illustrating transformation from CUR Tuckerformat (a) to form (b) as: X ∼= W ×1 B(1) ×2 B(2) ×3 B(3) =JW; CW†

(1), RW†(2), TW†

(3)K.

Page 15: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

15

The concept of CUR decomposition has been success-fully generalized to tensors. In [52] the matrix CURdecomposition was applied to one unfolded version ofthe tensor data, while in [49] a reconstruction formula ofa tensor having a low rank Tucker approximation wasproposed, termed the Fiber Sampling Tucker Decomposi-tion (FSTD), which is a practical and fast technique. TheFSTD takes into account the linear structure in all themodes of the tensor simultaneously. Since real-life dataoften have good low multilinear rank approximations,the FSTD provides such a low-rank Tucker decomposi-tion that is directly expressed in terms of a relativelysmall number of fibers of the data tensor (see Fig. 22).

For a given 3rd-order tensor X ∈ RI1×I2×I3 for whichan exact rank-(R1, R2, R3) Tucker representation exists,FSTD selects Pn ≥ Rn (n = 1, 2, 3) indices in eachmode, which determine an intersection sub-tensor W ∈RP1×P2×P3 so that the following exact Tucker representa-tion can be obtained:

X = JU; C, R, TK, (25)

in which the core tensor is computed as U = G =JW; W†

(1), W†(2), W†

(3)K, and the factor matrices C ∈RI1×P2P3 , R ∈ RI2×P1P3 , T ∈ RI3×P1P2 contain the fibers(columns, rows and tubes, respectively). This can alsobe written as a Tucker representation:

X = JW; CW†(1), RW†

(2), TW†(3)K. (26)

Observe that for N = 2 this model simplifies into theCUR matrix case, X = CUR, and the core matrix is U =JW; W†

(1), W†(2)K = W†WW† = W†.

In a more general case for an Nth-order tensor, we canformulate the following Proposition [49].

Proposition 2: If tensor X ∈ RI1×I2×···×IN has lowmultilinear rank {R1, R2, . . . , RN}, with Rn ≤ In, ∀n,then it can be fully reconstructed via the CUR FSTDX = JU; C(1), C(2), . . . , C(N)K, using only N factor ma-trices C(n) ∈ RIn×Pn , (n = 1, 2, . . . , N), built upfrom fibers of the data tensor, and a core tensor U =G = JW; W†

(1), W†(2), . . . , W†

(N)K, under the condition thatsubtensor W ∈ RP1×P2×···×PN with Pn ≥ Rn, ∀n hasmultilinear rank {R1, R2, . . . , RN}).

An efficient strategy for the selection of suitable fibers,only requiring access to a partial (small) subset of entriesof a data tensor through identifying the entries withmaximum modulus within single fibers is given in [49].The indices are selected sequentially using a deflationapproach making the FSTD algorithm suitable for verylarge-scale but relatively low-order tensors (includingtensors with missing fibers or entries).

VIII. Analysis of Coupled Multi-Block Tensor Data –Linked Multiway Component Analysis (LMWCA)

Group analysis or multi-block data analysis aims toidentify links between hidden components in data mak-ing it possible to analyze the correlation, variability and

consistency of the components across multi-block datasets. This equips us with enhanced flexibility: Somecomponents do not necessarily need to be orthogonalor statistically independent, and can be instead sparse,smooth or non-negative (e.g., for spectral components).Additional constraints can be used to reflect the spatialdistributions, spectral, or temporal patterns [3].

Consider the analysis of multi-modal high-dimensional data collected under the same or verysimilar conditions, for example, a set of EEG andMEG or fMRI signals recorded for different subjectsover many trials and under the same experimentconfiguration and mental tasks. Such data share somecommon latent (hidden) components but can also havetheir own independent features. Therefore, it is quiteimportant and necessary that they will be analyzed in alinked way instead of independently.

X(1)

X( )K

B(1, )K

. . .

. . .

I1

B(1,2)

X(2)I1

I1

I3

(1)

I3

(2)

I3

( )K

I2

(1)

I2

( )K

I2

(2)

BC

(1)

BI

(1,1)

BC

(1)

BI

(1,2)

BI

(1, )K

BC

(1)

G(1)

G(3)

B(1,1)

G(2)

Figure 23: Linked Multiway Component Analysis (LMWCA)for coupled multi-block 3rd-order tensors, with different di-mensions in each mode except of the first mode. The objectiveis to find the common components B(1)

C ∈ RI1×C1 , whereC1 ≤ R1 is the number of the common components in mode-1.

The linked multiway component analysis (LMWCA)for multi-block tensors data is formulated as a set of ap-proximate joint Tucker-(1, N) decompositions of a set of

data tensors X(k) ∈ RI(k)1 ×I(k)2 ×···×I(k)N , with I(k)1 = I1, ∀k(k = 1, 2, . . . , K) (see Fig. 23):

X(k) = G(k) ×1 B(1,k), (k = 1, 2, . . . K) (27)

where each factor (component) matrix B(1,k) =

[B(1)C , B(1,k)

I ] ∈ RIn×Rn has two sets of components: (1)Components B(1)

C ∈ RI1×C (with 0 ≤ C ≤ R), whichare common for all available blocks and correspond toidentical or maximally correlated components, and (2)components B(1,k)

I ∈ RI1×(R1−C1), which are different

Page 16: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

16

(a)

A G=(1)

G(2)

G(3)

G( -1)N

G(2)

G(3)

B G=( )N

G( -1)N

B G=( )N

A G=(1)�

�� � �

� �

=

I1 I2 I3 IN-1 I

N

J1 J2 J3 JN-1 JN

...I1

I2 In

IN

J =I1 1

J2J

n

JN

X...

I3

......

...

J3

Y

(b)

Þ

Þ

...I1

I2 I

n

IN

J =I1 1

J2

Jn

JN

X...

I3

......

...

J3

Y

Figure 24: Conceptual models of generalized Linked MultiwayComponent Analysis (LMWCA) applied to tensor networks:The objective is to find core tensors which are maximallycorrelated for (a) Tenor Train and for (b) Tensor Tree States(Hierarchical Tucker).

independent processes, for example, latent variables in-dependent of excitations or stimuli/tasks. The objectiveis to estimate the common components B(1)

C and inde-pendent (distinctive) components B(1,k)

I (see Fig. 23) [3].If B(n,k) = B(n)

C ∈ RIn×Rn for a specific mode n (in ourcase n = 1), under additional assumption that tensors areof the same dimension. Then the problem simplifies intogeneralized Common Component Analysis or tensorPopulation Value Decomposition (PVD) [57] and canbe solved by concatenating all data tensors along onemode, and perform constrained Tucker or CP tensordecompositions (see [57]).

In a more general case, when Cn < Rn, we canunfold each data tensor X(k) in common mode, andperform a set of linked and constrained matrix factor-izations: X(k)

(1)∼= B(1)

C A(1,k)C + B(1,k)

I A(1,k)I through solving

constrained optimization problems:

minK

∑k=1‖X(k)

(1) − B(1)C A(1,k)

C − B(1,k)I A(1,k)

I ‖F

+ f1(B(1)C ), s.t. B(1) T

C B(1,k)I = 0 ∀k,

(28)

where f1 are the penalty terms which impose additional

constraints on common components B(1)C , in order to

extract as many as possible unique and desired compo-nents. In a special case, when we impose orthogonalityconstraints, the problem can be transformed to a gen-eralized eigenvalue problem and solved by the powermethod [11]. The key point is to assume that commonfactor sub-matrices B(1)

C are present in all multiple datablocks and hence reflect structurally complex (hidden)latent and intrinsic links between them. In practice, thenumber of common components C1 in each mode isunknown and should be estimated (see [11] for detail).

The linked multiway component analysis model pro-vides a quite flexible and general framework and thussupplements currently available techniques for groupICA and feature extraction for multi-block data. TheLWCA models are designed for blocks of K tensors,where dimensions naturally split into several differentmodalities (e.g., time, space and frequency). In this sense,a multi-block multiway CA attempts to estimate bothcommon and independent or uncorrelated components,and is a natural extension of group ICA, PVD, andCCA/PLS methods (see [3], [11], [12], [58] and referencestherein). The concept of LMWCA can be generalized totensor networks as illustrated in Fig. 24.

IX. Mathematical and Graphical Description ofTensor Trains (TT) Decompositions

In this section we discuss in more detail the TensorTrain (TT) decompositions which are the simplest tensornetworks. Tensor train decomposition was introducedby Oseledets and Tyrtyshnikov [21], [59] and can takevarious forms depending on the order of input data asillustrated in Fig. 25.

a G(2)

G(3)

G(M-1)

b

A G(2)

G(3)

G(M-1)

B

A BG

(2)G

(3)G

(M-1)

Figure 25: Various forms of tensor train (TT) mod-els: (Top) Scalar function can be expressed as x =aTG(2)G(3) · · ·G(M−1)b, (middle) TT/MPS model of an Mth-order data tensor (multidimensional vector) is expressedby 3rd-order tensors and two factor matrices as: X =JA, G(2), G(3), . . . , G(M−1), BK; (bottom) TT/MPO model of2Mth-order data tensor (multidimensional matrix) can be ex-pressed by the chain of 3rd-order and 4th-order cores as:X = JA, G(2), G(3), . . . , G(M−1), BK.

The basic Tensor Train [21], [59], [60], called alsoMatrix Product State (MPS), in quantum physics [61]–[64] decomposes the higher-order tensor into set of 3rd-order core tensors and factor matrices as illustrated in

Page 17: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

17

(a)

R1R2

I2 I4

R3

I1

I1 I2

R2

R3

I4

R3R2

R1

1 2

(2)

,r rg

2 3

(3)

,r rg

3

(4)

1,rg

1 1(1 )� �I R 1 2 2( )R I R� � 2 3 3( )R I R� � 3 4( 1)� �R I

I3

I3

�1

3�

1

3�

1

3

R1

1

(1)

1,rg

G(2)

G(3)

AG(1)

= BG(4)

=

G(2)

G(3)

G(1)

G(4)

(b)

R3

I3I1

...

...

...

... ...

......R1

R2

R1

I2

R2R3

I4

1 1(1 )R I´ ´ 1 2 2( )R R I´ ´ 2 3 3( )R R I´ ´ 3 4( 1 )R I´ ´

(2)G

...

( )i2

(1)G ( )i1

(3)G ( )i3

(4)G ( )i4

( )i1

Tg (1)

= ( )i1g (4)=

Figure 26: Illustration of tensor train decomposition (TT/MPS) of a 4th-order data tensor X ∈ RI1×I2×I3×I4 . (a) Tensor form viamultilinear product of cores and/or the outer product of vectors (fibers) as sum of rank-1 tensors as: X ∼= G(1)×1

3 G(2)×13 G(3)×1

3G(4) = ∑R1

r1=1 ∑R2r2=1 ∑R3

r3=1(g 1,r1◦ g(2)r1,r2 ◦ g(3)r2,r3 ◦ g(4)r3,1) (for R1 = 3, R2 = 4, R3 = 5; R0 = R4 = 1). All vectors (fibers) g(n)rn−1rn ∈

RIn are considered as the column vectors. (b) Scalar form via slice matrices as: xi1,i2,i3,i4∼= G(1)(i1) G(2)(i2) G(3)(i3) G(4)(i4) =

∑R1,R2,...,RNr1,r2,r3,r4=1 g(1)1,i1,r1

g(2)r1,i2,r2g(3)r2,i3,r3

g(4)r3,i4,1.

Figs. 26 – 27. Note that the TT model is equivalent to theMPS only if the MPS has the open boundary conditions(OBC) [65], [66].

The tensor train (TT/MPS) for an Nth-order datatensor X ∈ RI1,×I2,×···×IN can be described in the variousequivalent mathematical forms as follows.• In a compact tensor form using multilinear prod-

ucts:

X ∼= A×12 G(2) ×1

3 G(3) ×13 · · · ×1

3 G(N−1) ×13 B

= JA, G(2), G(3), . . . , G(N−1), BK, (29)

where 3rd-order cores are defined as G(n) ∈RRn−1×In×Rn for n = 2, 3, . . . , N − 1 (see Fig. 26 (a)).

• By unfolding of cores G(n) and suitable reshapingof matrices, we can obtain other very useful math-ematical and graphical descriptions of the MPS,for example, as summation of rank-1 tensors usingouter (tensor) product (similar to CPD, Tucker and

PARATREE formats):

X ∼=R1,R2,...,RN−1

∑r1,r2,...,rN−1=1

g(1)1,r1◦ g(2)r1,r2 ◦ g(3)r2,r3 ◦ · · · ◦ g(N)

rN−1,1

(30)

where g(n)rn−1,rn ∈ RIn are column vectors of matricesG(n)

(2) = [g(n)1,1 , g(n)2,1 , . . . , g(n)Rn−1,1, g(n)1,2 , . . . , g(n)Rn−1,Rn] ∈

RIn×Rn−1Rn (n = 1, 2, . . . , N), with R0 = RN = 1.Note that g(1)1,r1

= ar1 are columns of the matrix

A = [a1, a2, . . . , aR1 ] ∈ RI1×R1 , while g(N)rN−1,1 = brN−1

are vector of the transposed factor matrix BT =[b1, b2, . . . , bRN−1 ] ∈ RIN×RN−1 (see Fig. 26 (a)).The minimal (N − 1) tuple {R1, R2, . . . , RN−1} iscalled TT-rank (strictly speaking for the exact TTdecomposition).

Page 18: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

18

1 1( )I R� 1 2 2

( )� �R I R2 3 3

( )� �R I R3 4

( )R I�

� ��

(1)G

1 1( )I R�

1 2 2( )R I R�

2 3 3( )R I R�

3 4( 1)R I �

1( 1)I �

2( 1)I �

(2)G �

(3)G

(4)G

R1R2 R3

I2 I3 I4I1

G(2)

G(3)

AG(1)

=

I1

R1

R3

I4

3( 1)I �

4( 1)�I

R2

R2

R1

I2

R3

I3

R2

1 2R I

2 3R I

R3

I1

R1

BG(4)

=

R1R2 R3

I2 I3 I4I1

Figure 27: Alternative representation of the tensor train decomposition (TT/MPS) expressed via strong Kronecker products ofblock matrices in the form of a vector as: xi1,i2,i3,i4

∼= G(1)| ⊗ | G(2) | ⊗ | G(3) | ⊗ | G(4) ∈ RI1 I2 I3 I4 , where block matrices are

defined as G(n) ∈ RRn−1 In×Rn , with block vectors g(n)rn−1,rn ∈ RIn×1 for n = 1, 2, 3, 4 and R0 = R4 = 1. For an illustrative purpose,we assumed that N = 4, R1 = 3, R2 = 4 and R3 = 5.

• Alternatively, we can use the standard scalar form:

xi1,i2,...,iN∼=

R1,R2,...,RN−1

∑r1,r2,...,rN−1=1

g(1)1,i1,r1g(2)r1,i2,r2

· · · g(n)rN−1,iN ,1,

(31)

or equivalently using slice representations (see Fig.26):

xi1,i2,...,iN∼= G(1)(i1) G(2)(i2) · · ·G(N)(iN)

= g(1) T(i1) G(2)(i2) · · · g(N)(iN),(32)

where slice matrices G(n)(in) = G(n)(:, in, :)= G(n)

Rn−1,Rn(in) ∈ RRn−1×Rn (with G(1)(i1) =

g(1) T(i1) ∈ R1×R1 and G(N)(iN) = g(N)(iN) ∈RRN−1×1) are lateral slices of the cores G(n) ∈RRn−1×In×Rn for n = 1, 2, . . . , N with R0 = RN = 1.

• By representing the cores G(n) ∈ RRn−1×In×Rn byunfolding matrices G(n) = (G(n)

(3) )T ∈ RRn−1 In×Rn for

n = 1, 2, . . . , N with R0 = RN = 1 and considering

them as block matrices with blocks g(n)rn−1,rn ∈ RIn×1,we can express the TT/MPS in the matrix form viastrong Kronecker products [67]–[69] (see Fig. 27 (c)and Fig. 28):

xi1,i2,...,iN∼= G(1) | ⊗ | G(2) | ⊗ | · · · | ⊗ | G(N), (33)

where the vector xi1,i2,...,iN= x(1:N) ∈ RI1 I2···IN de-

notes vectorization of the tensor X in lexicographicalorder of indices and | ⊗ | denotes strong Kroneckerproduct.

The strong Kronecker product of two block matrices(e.g., unfolding cores):

A =

A1,1 · · · A1,R2...

. . ....

AR1,1 · · · AR1,R2

∈ RR1 I1×R2 J1

and

Page 19: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

19

B =

B1,1 · · · B1,R3...

. . ....

BR2,1 · · · BR2,R3

∈ RR2 I2×R3 J2

is defined as a block matrix

C = A | ⊗ | B ∈ RR1 I1 I2×R3 J1 J2 , (34)

with blocks Cr1,r3 = ∑R2r2=1 Ar1,r2 ⊗ Br2,r3 ∈ RI1 I2×J1 J2 ,

where Ar1,r2 ∈ RI1×J1 and Br2,r3 ∈ RI2×J2 are blockmatrices of A and B, respectively (see also Fig. 28 forgraphical illustration).

A

B

Ä =

C=A BÄ

A11

A12

A13

A21

A22

A23

B11

B12

B21 B

22

B31

B32

A11 11

+A12 21

+A13 31

A11 12

+A12 22

+A13 32

A21 11

+A22 21

+A23 31

A21 12

+A22 22

+A23 32

Figure 28: Illustration of definition of the strong Kroneckerproduct for two block matrices. The strong Kronecker prod-uct of two block matrices A = [Ar1,r2 ] ∈ RR1 I1×R2 J1 andB = [Br2,r3 ] ∈ RR2 I2×R3 J2 is defined as the block matrixC = A| ⊗ |B ∈ RR1 I1 I2×R3 J1 J2 , with blocks Cr1,r3 = ∑R2

r2=1 Ar1,r2 ⊗Br2,r3 ∈ RI1 I2×J1 J2 , for r1 = 1, 2; r2 = 1, 2, 3 and r3 = 1, 2.

The matrix strong Kronecker product can be gener-alized to block tensors as follows: Let A =

[Ar1,r2

]∈

RR1 I1×R2 J1×K1 and B =[Br2,r3

]∈ RR2 I2×R3 J2×K2 are

R1× R2 and R2× R3 block tensors, where blocks Ar1,r2∈

RI1×J1×K1 and Br2,r3∈ RI2×J2×K2 are 3rd order tensors,

then the strong Kronecker product of A and B is definedby the R1 × R3 block tensor

C =[Cr1,r3

]= A | ⊗ | B ∈ RR1 I1 I2×R3 J1 J2×K1K2 , (35)

where

Cr1,r3=

R2

∑r2=1

Ar1,r2⊗ Br2,r3

∈ RI1 I2×J1 J2×K1K2 ,

for r1 = 1, 2, . . . , R1 and r3 = 1, 2, . . . , R3.Another important TT model, called matrix TT or

MPO (Matrix Product Operator with Open BoundaryConditions), consists of a chain (train) of 3rd-order and4th-order cores, as illustrated in Fig. 29. Note that a3rd-order tensor can be represented equivalently as ablock (column or row) vector in which each element(block) is a matrix (lateral slice) of the tensor, whilea 4th-order tensor can represented equivalently as ablock matrix. The TT/MPO model for 2Nth-order tensorX ∈ RI1×I2×···I2N can be described mathematically in thefollowing general forms (see also Table III).• A) In the tensor compact form using multilinear

products

X ∼= G(1) ×14 G(2) ×1

4 · · · ×14 G(N)

= JG(1), G(2), . . . , G(N)K, (36)

where the cores are defined as G(n) ∈RRn−1×I2n−1×I2n×Rn , with R0 = RN = 1,(n = 1, 2, . . . , N).

• B) Using the standard (rather long and tedious)scalar form:

xi1,i2,...,i2N∼=

R1

∑r1=1

R2

∑r2=1· · ·

RN−1

∑rN−1=1

g(1)1,i1,i2,r1g(2)r1,i3,i4,r2

· · ·

· · · g(N−1)rN−2,i2N−3,i2N−2,rN−1

g(N)rN−1,i2N−1,i2N ,1.

(37)

• C) By matrix representations of cores, the TT/MPOdecomposition can be expressed by strong Kro-necker products (see Fig. 29):

X(i1,i3,...,i2N−1 ; i2,i4,...,i2N)∼= G(1) |⊗ | G(2) |⊗ | · · · |⊗ | G(N),

(38)where X(i1,i3,...,i2N−1 ; i2,i4,...,i2N) ∈ RI1 I3···I2N−1×I2 I4···I2N

is unfolding matrix of X in lexicographical orderof indices and G(n) ∈ RRn−1 I2n−1×Rn I2n are blockmatrices with blocks G(n)

rn−1,rn ∈ RI2n−1×I2n and thenumber of blocks Rn−1 × Rn. In the special casewhen ranks of the TT/MPO Rn = 1, ∀n thestrong Kronecker products simplify to the standardKronecker products.

The Tensor Train (TT) format [59], can be interpretedas a special case of the HT [7], where all nodes ofthe underlying tensor network are aligned and where,moreover, the leaf matrices are assumed to be identities(and thus need not be stored). An advantage of the TTformat is its simpler practical implementation using SVDor alternative low-rank matrix approximations, as nobinary tree need be involved [21], [70] (see Figs. 30 and31).

Two different types of approaches to perform tensorapproximation via TT exist [26]. The first class of meth-ods is based on combining standard iterative algorithms,with a low-rank decompositions, such as SVD/QR orCUR or Cross-Approximations. Similar to the Tuckerdecomposition, the TT and HT decompositions are usu-ally based on low rank approximation of generalizedunfolding matrices X([n]), and a good approximation in adecomposition for a given TT/HT-rank can be obtainedusing the truncated SVDs of the unfolding matrices [59],[70].

In [55] Oseledets and Tyrtyshnikov proposed for TTdecomposition a new approximative formula in which aNth-order data tensor is interpolated using special formof Cross-Approximation, which is a modification of CURalgorithm. The total number of entries and the com-plexity of the interpolation algorithm depend linearly onthe order of data tensor N, so the developed algorithmdoes not suffer from the curse of dimensionality. TheTT-Cross-Approximation is analog to the SVD/HOSVDlike algorithms for TT/MPS, but uses adaptive cross-approximation instead of the computationally more ex-pensive SVD.

Page 20: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

20

I8I6I4R1R2 R3

I3 I5 I7I1

1 2 1(1 )� � �I I R

1 3 4 2( )� � �R I I R

52 6 3( )� � �R I I R

73 8( 1)� � �R I I

G(2)

G(3)

I3

R2R3

R3

G(1)

G(4)

I4I6

� ��

1 1 2( )I R I�

1 3 2 4( )R I R I�

52 3 6( )R I R I�

73 8( )R I I�

1 2( )I I�

3 4( )I I�

5 6( )I I�

7 8( )I I�

I2

I1

R1

I2

I5

I7I8

1

4�

1( )R 2

( )R

(1)G �

(2)G �

(3)G �

(4)G

1

4�

1

4�

I3

R2I4

... R3I6

I5

...

I8I6I4

R1R2 R3

I3I5 I7

I1

I2

Figure 29: The matrix tensor train decomposition (TT/MPO) for an 8th-order data tensor or equivalently multidimensionalmatrix X ∈ RI1×I2 , with I1 = I1 I3 I5 I7 and I2 = I2 I4 I6 I8, expressed by the chain of 4th-order cores as: X ∼= G(1) ×1

4 G(2) ×14

G(3) ×14 G(4) = JG(1), G(2), G(3), G(4)K or in a scalar form as xi1,i2,...,i8

∼= ∑R1r1=1 ∑R2

r2=1 ∑R3r3=1 g(1)1,i1,r1,i2

g(2)r1,i2,r2,i4g(3)r2,i5,r3,i6

g(4)r3,i7,i8,1.Alternatively, the TT/MPO decomposition can be expressed in a compact and elegant matrix form as strong Kronecker productof block matrices G(n) = G(n)

(rn−1, i2n−1; rn , i2n)∈ RRn−1 I2n−1×Rn I2n (with blocks G(n)

rn−1,rn ∈ RI2n−1×I2n ) as: X(i1,i3,i5,i7 ; i2,i4,i6,i8)∼= G(1) | ⊗

| G(2) | ⊗ | G(3) | ⊗ | G(4) ∈ RI1×I2 . For simplicity, we assumed that R1 = 3, R2 = 4 and R3 = 5, n=1,2,3,4.

The second class of algorithms based on optimiza-tion of suitable designed cost functions, often with ad-ditional penalty or regularization terms. Optimizationtechniques include gradient descent, conjugate gradient,and Newton-like methods (see [26], [71] and referencestherein). Gradient descent methods leads often to theAlternating Least Squares (ALS) type of algorithms,which can be improved in various ways [71], [73] (seeFig. 32).

A quite successful improvement in TNs (TT, HT)is called the DMRG method. It joins two neighboringfactors (cores), optimize the resulting “supernode”, andsplits the result into separate factors by a low-rankmatrix factorization [65], [66], [71], [72] (see Fig. 33).

Remark: In most optimization problems it is veryconvenient to present TT in a canonical form, in whichall cores are left or right orthogonal [71], [74] (see alsoFig. 32 and Fig. 33).

The N-order core tensor is called right-orthogonal if

G(1)GT(1) = I. (39)

Analogously the N-order core tensor is called left or-thogonal if

G(N)GT(N) = I. (40)

In contrast, for the all-orthogonal core tensor we haveG(n)GT

(n) = I, ∀n.

Page 21: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

21

TABLE III: Different forms of the Tensor Trains (TT): MPS and MPO (with OBC) representations of an Nth-order tensor X ∈RI1×I1×···×IN and a 2Nth-order tensor Y ∈ RI1×J1×I2×J2···×IN×JN , respectively. It is assumed that the TT rank is {R1, R2, . . . , RN−1},with R0 = RN = 1 (r0 = rN = 1).

TT/MPS TT/MPO

Scalar (standard) Representations

x i1,i2,...,iN =R1,R2,...,RN−1

∑r1,r2,...,rN−1=1

g(1)1,i1,r1g(2)r1,i2,r2

g(3)r2,i3,r3· · · g(N)

rN−1,iN ,1 yi1,j1,i2,j2,...,iN ,jN =R1,R2,...,RN−1

∑r1,r2,...,rN−1=1

g(1)1,i1,j1,r1g(2)r1,i2,j2,r2

· · · g(N)rN−1,iN ,jN ,1

g(n)rn−1,in ,rnentries of a 3rd-order core G(n) ∈ RRn−1×In×Rn g(n)rn−1,in ,jn ,rn

entries of a 4th-order core G(n) ∈ RRn−1×In×Jn×Rn

Slice Representations

x i1,i2,...,iN = G(1)(i1) G(2)(i2) · · ·G(N−1)(iN−1) G(N)(iN) yi1,j1,i2,j2,...,iN ,jN = G(1)(i1, j1) G(2)(i2, j2) · · ·G(N)(iN , jN)

G(n)(in) ∈ RRn−1×Rn lateral slices of cores G(n) G(n)(in, jn) ∈ RRn−1×Rn slices of cores G(n)

Tensor Representations: Multilinear Products (tensor contractions)

X = G(1) ×13 G(2) ×1

3 · · · ×13 G(N−1) ×1

3 G(N) Y = G(1) ×14 G(2) ×1

4 · · · ×14 G(N−1) ×1

4 G(N)

G(n) ∈ RRn−1×In×Rn , (n = 1, 2, . . . , N) G(n) ∈ RRn−1×In×Jn×Rn

Tensor Representations: Outer Products

X =R1,R2,...,RN−1

∑r1,r2,...,rN−1=1

g(1)1,r1◦ g(2)r1,r2 ◦ · · · ◦ g(N−1)

rN−2,rN−1 ◦ g(N)rN−1,1 Y =

R1,R2,...,RN−1

∑r1,r2,...,rN−1=1

G(1)1,r1◦ G(2)

r1,r2 ◦ · · · ◦ G(N−1)rN−2,rN−1 ◦ G(N)

rN−1,1

g(n)rn−1,rn ∈ RIn blocks of a matrix G(n) = (G(n)(3) )

T ∈ RRn−1 In×Rn G(n)rn−1,rn ∈ RIn×Jn blocks of a matrix G(n) ∈ RRn−1 In×Rn Jn

Vector/Matrix Representations: Kronecker and Strong Kronecker Products

x(i1,...,iN)=

R1,R2,...,RN−1

∑r1,r2,...,rN−1=1

g(1)1,r1⊗ g(2)r1,r2 ⊗ · · · ⊗ g(N)

rN−1,1 Y(i1,...,iN); j1,...,jN)=

R1,R2,...,RN−1

∑r1,r2,...,rN−1=1

G(1)1,r1⊗ G(2)

r1,r2 ⊗ · · · ⊗ G(N)rN−1,1

x(i1,...,iN)= G(1) | ⊗ | G(2) | ⊗ | · · · | ⊗ | G(N) ∈ RI1 I2···IN Y(i1,...,iN); j1,...,jN)

= G(1) | ⊗ | G(2) | ⊗ | · · · | ⊗ | G(N) ∈ RI1···IN × J1···JN

G(n) ∈ RRn−1 In×Rn a block matrix with blocks g(n)rn−1,rn ∈ RIn ; G(n) ∈ RRn−1 In×Rn Jn a block matrix with blocks G(n)rn−1,rn ∈ RIn×Jn

TT-Rounding TT–rounding (also called truncation orrecompression) [55] is post-processing procedure to re-duce the TT ranks which in the first stage after applyinglow-rank matrix factorizations are usually not optimalwith respect of desired approximation errors. The op-timal computation of TT-tensor is generally impossiblewithout TT-rounding. The tensor expressed already inTT format is approximated by another TT-tensor withsmaller TT-ranks but with prescribed accuracy of ap-proximation ε. The most popular TT-rounding algorithmis based on the QR/SVD algorithm, which requiresO(NIR3) operations [21], [75]. In practice, we avoid

explicit construction of these matrices and the SVDswhen truncating a tensor in TT decomposition to lowerTT-rank. Such truncation algorithms for TT are describedby Oseledets in [59]. In fact, the method exploits micro-iterations algorithm where the SVD is performed onlyon a relatively small core at each iteration. A similarapproach has been developed by Grasedyck for the HT[76]. HT/TT algorithms that avoid the explicit compu-tation of these SVDs when truncating a tensor that isalready in tensor network format are discussed in [26],[72], [77].

TT Toolbox developed by Oseledets (http://spring.

Page 22: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

22

Figure 30: Illustration of the SVD algorithm for TT/MPS fora 5th-order tensor. Instead of truncated SVD, we can employany low-rank matrix factorizations, especially QR, CUR, SCA,ICA, NMF.

inm.ras.ru/osel/?page id=24) is focussed on TT struc-tures, and deals with the curse of dimensionality[75]. The Hierarchical Tucker (HT) toolbox by Kressnerand Tobler (http://www.sam.math.ethz.ch/NLAgroup/htucker toolbox.html) and Calculus library by Hack-busch, Waehnert and Espig, focuss mostly on HTand TT tensor networks [25], [75], [77]. See also

I1

I2

I3

I4

I5I6

� � �

I1I2

I3 I4 I5 I6

XX(1,2;3,4,5,6) X1

I3 I4 I5 I6U1 V1

R1

SVD(LRA)

X1� U V1 1

� �Reshape U1 Reshape V2

I1

I2R1 I3 I4 I5 I6

X2

G(1)

SVD(LRA)

� X2� U V2 2SVDT

R1I3 I4 I5 I6U2

R1

V2R2

�Reshape U2 � Reshape V2

I4R1

I3

G(2)

R2

I6

I5

G(3)

I4

R1

I3

G(2)

R2

I6

I5

G(3)

I2

I1

G(1)

X

I1I2

�T

(A B )1 1

T

R2

=

Figure 31: SVD algorithm for TT/MPO for a 6th-order tensor.Instead of the SVD we can use alternative Low-Rank Approxi-mations (constrained matrix factorizations, e.g., CUR or NMF).

recently developed TDALAB (http://bsp.brain.riken.jp/TDALAB and TENSORBOX http://www.bsp.brain.riken.jp/∼phan that provide user-friendly interface andadvanced algorithms for selected TD (Tucker, CPD) mod-els [78], [79]. The 〈http://www.esat.kuleuven.be/sista/tensorlab/〉Tensorlab toolbox builds upon the complexoptimization framework and offers efficient numericalalgorithms for computing the TDs with various con-straints (e.g. nonnegativity, orthogonality) and the pos-sibility to combine and jointly factorize dense, sparseand incomplete tensors [80]. The problems related tooptimization of existing TN/TD algorithms are activearea of research [26], [71], [73], [81], [82].

X. Hierarchical Outer Product Tensor Approximation(HOPTA) and Kronecker Tensor Decompositions

Recent advances in TDs/TNs include, TT/HT [26],[60], [71], PARATREE [83], Block Term Decomposition

Page 23: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

23

1 opt. stepst

3 rd opt. step

ß

ß

A half sweep from left to right

QR + 2nd opt. step

......

A half sweep from right to left

Flip sweepingdirection

...

ß

Figure 32: Extension of the ALS algorithm for TT decomposi-tion. The idea is to optimize only one core tensor at a time (bya minimization of suitable cost function), while keeping theothers fixed. Optimization of each core tensor is followed byan orthogonalization step via the QR or more expensive SVDdecomposition. Factor matrices R are absorbed (incorporated)into the following core.

(BTD) [84], Hierarchical Outer Product Tensor Approxi-mation (HOPTA) and Kronecker Tensor Decomposition(KTD) [85]–[87].

HOPTA and KTD models can be expressed mathemat-ically in simple nested (hierarchical) forms, respectively(see Fig. 34 and Fig. 35):

X ∼=R

∑r=1

(Ar ◦ Br), (41)

X =R

∑r=1

(Ar ⊗ Br), (42)

where each factor tensor can be represented recursivelyas Ar

∼= ∑R1r1=1(A

(1)r1 ◦ B(1)

r1 ) or Br∼= ∑R2

r2=1 A(2)r2 ◦ B(2)

r2 , etc.The Kronecker product of two tensors: A ∈

RI1×I2×···×IN and B ∈ RJ1×J2×···×JN yields C = A⊗ B ∈

. . .

Merging

1 opt. stepst

LRA(SVD),merging and

2nd opt. step

. . .

Figure 33: Modified ALS (MALS) algorithm related to theDensity Matrix Renormalization Group (DMRG) for TT de-composition. In each optimization step, two neighbor cores aremerged. An optimization performed over merged “supercore”.After optimization in the next step we apply truncated SVD orother low-rank matrix factorizations (LRA) to separate the op-timized supercore. For example, for nonnegative TT SVD stepscan replaced by a non-negative matrix factorization (NMF)algorithm. Note that each optimization sub-problem is moreexpensive than the standard ALS and complexity increases butconvergence speed may increase dramatically (see also [71],[72]).

RI1 J2×···×IN JN , with entries ci1⊗j1,...,iN⊗jN = ai1,...,iN bj1,...,jN ,where the operator ⊗ for indices in = 1, 2, . . . , In andjn = 1, 2, . . . , Jn is defined as follows in, jn = jn + (in −1)Jn (see Fig. 34).

Note that the 2Nth-order sub-tensors Ar ◦ Br andAr ⊗ Br actually have the same elements, arranged dif-ferently. For example, if X = A ◦ B and X′ = A ⊗ B,where A ∈ RJ1×J2×···×JN and B ∈ RK1×K2×···×KN , thenxj1,j2,...,jN ,k1,k2,...,kN = x′k1+K1(j1−1),...,kN(KN−1).

i1

i2

i3

i4

j1

j3

j4

j2

A B

1 1 1�k i , j

4 4 4�k i , j

2 2 2�k i , j

3 3 3�k i , j

Figure 34: Kronecker product of two 4th-order tensors yieldsa tensor C = A ⊗ B ∈ RI1 J1×···×I4 J4 , with entries ck1,k2,...,k4

=ai1,...,i4 bj1,...,j4 , where kn = in, jn = in⊗jn = jn + (in − 1)Jn (n =1, 2, 3, 4).

Page 24: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

24

��

1��

R

r�

Q

1��q

+

X

1��

R

r� +

X

1��

P

p�

β ( )A B�p pp

1��

R

r� +

1��

P

p�

α ( )A b�r r r

α ( )A b�rr r

β ( )CA B� �p p pp

X

1��

R

r�

α ( )A B C� �r rr r

α ( )A B�rr r

1��

R

r

X

α ( )A B�r rr

� �

� � �

(1) (2) (3)( )q q q

b b b� �λq

Figure 35: Illustration of Hierarchical Outer Product TensorApproximation (HOPTA) for higher-order data tensors of dif-ferent orders. Each component tensor: Ar, Br and/or Cr canbe further decomposed using a suitable tensor network model.The model can be considered as an extension or generalizationof the Block Term Decomposition (BTD) model to higher ordertensors.

+ . . . ++@ A1

(1)

1b

(2)

1b

(3)

1b

A2 AR

(1)bR

(2)bR

(3)bR

X(2)

2b

(1)

2b

(3)

2b

Figure 36: Illustration of the decomposition of an 6th-ordertensor using the BTD of rank-(Lr, Lr, 1) as: X = ∑R

r=1 Ar ◦(b(1)r ◦ b(2)r ◦ b(3)r ) [36], which can be considered as a specialcase of the HOPTA.

It is interesting to note that the KTD and HOPTAcan be considered in special cases as a flexible form ofBlock Term Decomposition (BTD) introduced first by DeLathauwer [36], [84], [88], [89].

The definition of the tensor Kronecker product as-sumes that both core tensors Ar and Br have the sameorder. It should be noted that vectors and matrices canbe treated as tensors, e.g, matrix of dimension I × J canbe treated formally as 3rd-order tensor of dimensionI × J × 1. In fact, from the KTD model, we can generatemany existing and emerging TDs by changing structuresand orders of factor tensors: Ar and Br, for example:• If Ar are rank-1 tensors of size I1 × I2 × · · · × IN ,

and Br are scalars, ∀r, then (42) expresses the rank-

R CPD.• If Ar are rank-Lr tensors in the Kruskal (CP) format,

of size I1 × I2 × · · · × IR × 1 × · · · × 1, and Br arerank-1 CP tensor of size 1× · · ·× 1× IR+1× · · ·× IN ,∀r, then (42) expresses the rank-(Lr ◦ 1) BTD [84].

• If Ar and Br are expressed by KTDs, we have NestedKronecker Tensor Decomposition (NKTD), whereTensor Train (TT) decomposition is a particular case[59], [60], [90]. In fact, the model (42) can be usedfor) the recursive TT-decompositions [59].

In this way, a large variety of tensor decompositionmodels can be generated. However, only some of themyield unique decompositions and to date only a few havefound concrete applications is scientific computing.

The advantage of HOPTA models over BTD and KTDis that they are more flexible and can approximate veryhigh order tensors with a relative small number ofcores, and they allow us to model more complex datastructures.

XI. Tensorization and Quantization – Blessing ofDimensionality

A. Curse of Dimensionality

The term curse of dimensionality, in the context oftensors, refers to the fact that the number of elementsof an Nth-order (I × I × · · · × I) tensor, IN , growsexponentially with the tensor order N. Tensors can easilybecome really big for very high order tensors since thesize is exponentially growing with the number of di-mensions (ways, or modes). For example, for the Tuckerdecomposition the number of entries of a original datatensor but also a core tensor scales exponentially in thetensor order, for instance, the number of entries of anNth-order (R× R× · · · × R) core tensor is RN .

If all computations are performed on a CP tensorformat and not on the raw data tensor itself, theninstead of the original IN raw data entries, the numberof parameters in a CP representation reduces to NRI,which scales linearly in N and I (see Table IV). Thiseffectively bypasses the curse of dimensionality, howeverthe CP approximation may involve numerical problems,since existing algorithms are not stable for high-ordertensors. At the same time, existing algorithms for tensornetworks, especially TT/HT ensure very good numericalproperties (in contrast to CPD algorithms), making itpossible to control an error of approximation i.e., toachieve a desired accuracy of approximation [60].

B. Quantized Tensor Networks

The curse of dimensionality can be overcome throughquantized tensor networks, which represents a tensor ofpossibly very high-order as a set of sparsely intercon-nected low-order and very low dimensions cores [60],[91]. The concept of quantized tensor networks was firstproposed by Khoromskij [92] and Oseledets [91]. Thevery low-dimensional cores are interconnected via tensor

Page 25: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

25

(a)

I=26

(64�1)

(2 2 2 2 2 2)� � � � �

. . .

��r

� ��r

2 2 2 2 2 2

rB r

Br

B

(1)rb

(2)rb

(3)rb

(4)rb

(5)rb

(6)rb

22

2

2

22=

G(1)

G(2)

G(3)

G(4)

G(5)

G(6)

(1) (2) (3)

CPD

HOPTA

QTT

(b)

�...

xI=2

K XI1

I1,1=2

I1,K=2

I1,2=2

(c)

Þ

Y

I2

I3I1

.... ..

...

I3,2

I3,1

I1,2I1,1

I2,1

33,KI

22,KI

11,KI

I2,2

X

11

=K

I q

22

=K

I q

33

=K

I qq

q

q q

q

q

q q

q

Figure 37: Tensorization. (a) Illustration of the concept oftensorization of a (large-scale) vector (I = 2K) or matrix(2L × 2L) into a higher-order tensor. In order to achieve super-compression through a suitable quantized tensor decomposi-tion (e.g., decomposition into rank-1 tensors X ∼= ∑R

r=1 b(1)r ◦b(2)r ◦ · · · ◦ b(6)r or rank-q terms using Hierarchical Outer Prod-uct Tensor Approximation (HOPTA) as: X ∼= ∑R

r=1 Ar ◦ Br ◦ Cror Quantized Tensor Train (QTT). (b) Symbolic representa-tion of tensorization of the vector x ∈ RI into Kth-orderquantized tensor X ∈ R2×2×···×2. (c) Tensorization of a 3rd-order tensor X ∈ RI1×I2×I3 into (K1 + K2 + K3)th-order tensorY ∈ RI1,1×···×I1,K1×I2,1×···×I3,K3 with In,kn = q.

contractions to provide an efficient, highly compressedlow-rank representation of a data tensor.

The procedure of creating a data tensor from lower-order original data is referred to as tensorization. Inother words, lower-order data tensors can be reshaped(reformatted) into high-order tensors. The purpose of asuch tensorization is to achieve super compression [92].In general, very large-scale vectors or matrices can beeasily tensorized to higher-order tensors, then efficientlycompressed by applying a suitable TT decomposition;

this is the underlying principle for big data analysis [91],[92]. For example, the quantization and tensorization ofa huge vector x ∈ RI , I = 2K can be achieved throughreshaping to give an (2× 2× · · ·× 2) tensor X of order K,as illustrated in Figure 37 (a). Such a quantized tensor Xoften admits low-rank approximations, so that a goodcompression of a huge vector x can be achieved byenforcing a maximum possible low-rank structure on thetensor network.

Even more generally, an Nth-order tensor X ∈RI1×···×IN , with In = qKn , can be quantized in all modessimultaneously to yield a (q× q× · · · q) quantized tensorY of higher-order, with small q, (see Fig. 37 (c) and Fig.38).

In the example shown in Fig. 38 the Tensor Train of ahuge 3rd-order tensor cab be represented by the strongKronecker products of block tensors with relatively small3rd-order blocks.

Recall that the strong Kronecker product of two

block cores: G(n) =

G(n)

1,1 · · · G(n)1,Rn

.... . .

...G(n)

Rn−1,1 · · · G(n)Rn−1,Rn

RRn−1 I3n−2×Rn I3n−1×I3n and

G(n+1) =

G(n+1)

1,1 · · · G(n+1)1,Rn+1

.... . .

...G(n+1)

Rn ,1 · · · G(n+1)Rn ,Rn+1

RRn I3n+1×Rn+1 I3n+2×I3n+3 is defined as a block tensor

C = G(n) | ⊗ | G(n+1)

∈ RRn−1 I3n−2 I3n+1×Rn+1 I3n−1 I3n+2×I3n I3n+3 , (43)

with blocks G(n)rn−1,rn+1 = ∑Rn

rn=1 G(n)rn−1,rn ⊗ G(n+1)

rn ,rn+1 ,

where G(n)rn−1,rn ∈ RI3n−2×I3n−1×I3n and G(n+1)

rn ,rn+1 ∈RI3n+1×I3n+2×I3n+3 are block tensors of G(n) and G(n+1),respectively.

In practice, a fine (q = 2, 3, 4 ) quantization is desirableto create as many virtual modes as possible, thus allow-ing us to implement efficient low-rank tensor approxi-mations. For example, the binary encoding (q = 2) re-shapes an Nth-order tensor with (2K1 × 2K2 × · · · × 2KN )elements into a tensor of order (K1 + K2 + · · · + KN),with the same number of elements. In other words, theidea of the quantized tensor is quantization of each n-th physical mode (dimension) by replacing it with Knvirtual modes, provided that the corresponding modesize In are factorized as In = In,1 In,2 · · · In,Kn . This cor-responds to reshaping the n-th mode of size In into Knmodes of sizes In,1, In,2, . . . , In,Kn .

The TT decomposition applied to quantized tensorsis referred to as the QTT; it was first introduced as acompression scheme for large-scale matrices [91], andalso independently for more general settings [69], [74],[92]–[94]. The attractive property of QTT is that notonly its rank is typically small (below 10) but it is

Page 26: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

26

(a)

I1

I3

ÞX1 1 4 7 10

=I I I I I I2

I4 I

5

I6

I7

I8

I9

I10I

11

I12

3 3 6 9 12=I I I I I

2 2 5 8 11=I I I I I

(b)

R1R2 R3

I3I1

1 2 3 1( )I I I R� � �

51 4 6 2( )R I I I R� �� � 72 8 9 3

( )R I I I R� � � �3 10 11 12

( )R I I I� � �

G(2)

G(3)

G(1)

G(4)

� ��

I2

1 2 3( )I I I� �

I4 I7 I10I6 I9 I12I5 I8 I11

R1

R2

R3

R2R1

R3

54 6( )I I I� �

7 8 9( )I I I� �

10 11 12( )I I I� �

1 2 1 3( )� �I I R I

51 4 2 6( )��R I R I I

72 3 8 9( )� �R I R I I

3 10 11 12( )� �R I I I

(1)G �

(2)G

(4)G�

(3)G

R1R2 R3

I3I1

I2

I4 I7 I10I6 I9 I12I5 I8 I11

Figure 38: Example of tensorization and decomposition of a large-scale 3rd-order tensor X ∈ RI1×I2×I3 into 12th-order tensorassuming that I1 = I1 I4 I7 I10, I2 = I2 I5 I8 I11 and I3 = I3 I6 I9 I12: (a) Tensorization and (b) representation of the higher-ordertensor via generalized Tensor Train refereed to as the Tensor Product States (TPS). The data tensor can be expressed by strong

Kronecker product of block tensors as X ∼= G(1) | ⊗ | G

(2) | ⊗ | G(3) | ⊗ | G

(4) ∈ RI1×I2×I3 , where each block of the core

G(n) ∈ RRn−1 I3n−2×Rn I3n−1×I3n is a 3rd-order tensor of dimensions (I3n−2 × I3n−1 × I3n), with R0 = R4 = 1 for n = 1, 2, 3, 4.

almost independent or at least uniformly bounded bydata size (even for I = 250), providing a logarithmic(sub-linear) reduction of storage requirements: O(IN)→O(N logq(I)) – so-called super-compression [92].

Compared to the TT decomposition (without quan-tization), the QTT format often represents more deepstructure in the data by introducing some “virtual”dimensions. The high compressibility of the QTT-approximation is a consequence of the noticeable sep-arability properties in the quantized tensor for suitablystructured data.

The fact that the TT/QTT ranks are often moderate oreven low, e.g., constant or growing linearly with respectto N and constant or growing logarithmically with re-spect to I, is an important issue in the context of big dataanalytic and has been addressed so far mostly experi-mentally (see [26], [60], [92] and references therein). Onthe other hand, the high efficiency of multilinear algebrain the TT/QTT algorithms based on the well-posednessof the TT low-rank approximation problems and the factthat such problems are solved quite efficiently with theuse of SVD/QR, CUR and other cross-approximation

Page 27: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

27

TABLE IV: Storage complexities of tensor decompositionmodels for an Nth-order tensor X ∈ RI1×I2×···×IN , forwhich the original storage complexity is O(IN), whereI = max{I1, I2, . . . , IN}, while R is an upper bound onthe ranks of tensor decompositions considered, that is R =max{R1, R2, . . . , RN−1} or R = max{R1, R2, . . . , RN}.

1. CPD O(NIR)

2. Tucker O(NIR + RN)

3. TT/MPS O(NIR2)

4. TT/MPO O(NI2R2)

5. Quantized TT/MPS (QTT) O(NR2 logq(I))

6. QTT+Tucker O(NR2 logq(I) + NR3)

7. Hierarchical Tucker (HT) O(NIR + NR3)

techniques.In general, tensor networks can be considered as dis-

tributed high-dimensional tensors built up from manycore tensors of low dimension through specific tensorcontractions. Indeed, tensor networks (TT, MPS, MPO,PEPS and HT) have already been successfully used tosolve intractable problems in computational quantumchemistry and in scientific computing [63], [68], [69],[94]–[96].

However, in some cases, the ranks of the TT or QTTformats grow quite significantly with the linearly in-creasing of approximation accuracy. To overcome thisproblem, new tensor models of tensor approximationwere developed, e.g., Dolgov and Khoromskij, proposedthe QTT-Tucker format [74] (see Fig. 39), which exploitsthe TT approximation not only for the Tucker core tensor,but also QTT for the factor matrices. This model allowsdistributed computing, often with bounded ranks andto avoid the curse of dimensionality. For very largescale tensors we can apply a more advanced approachin which factor matrices are tensorized to higher-ordertensors and then represented by TTs as illustrated in Fig.39.

Modern methods of tensor approximations combinemany TNs and TDs formats including the CPD, BTD,Tucker, TT, HT decompositions and HOPTA (see Fig.35) low-parametric tensor format. The concept tensoriza-tion and by representation of a very high-order tensorin tensor network formats (TT/QTT, HT, QTT-Tucker)allows us to treat efficiently a very large-scale structureddata that admit low rank tensor network approxima-tions. TT/QTT/HT tensor networks have already foundpromising applications in very large-scale problems inscientific computing, such as eigenanalysis, super-fastFourier transforms, and solving huge systems of largelinear equations (see [26], [66], [74], [93], [97] and refer-ences therein).

In summary, the main concept or approach is to ap-

(a)

´

In

In

An @ A

n,1A

n,2 An K,

Rn,1 R

n,2

Jn

Jn

In,1 I

n,2 In K,

(b)

- - -

- - -

A=G(1)

G( -1)N

RN-1 B=G

( )N

R1

Ù

R2

Ù

RN-1

Ù

RN

Ù

A1 A2 AN-1

AN

I1 I2 IN-1 I

N

R1R2

- -

- -

- -

- -

RN-1

I1,1A1,1

A2,1 AN-1,1 A

N,1

I2,1 IN-1,1 I

N,1

R1R2

I1,K I2,KIN K-1, I

N K,

A1,K A2,KA

N K-1,A

N K,

ß

R1

Ù

R2

Ù

RN-1

Ù

RN

Ù

G(2)

A=G(1)

G(2)

G( -2)N

B=G( )N

I1 I2IN...

...

´´´´

Figure 39: QTT-Tucker format. (a) Distributed representation ofa large matrix An ∈ RIn×Rn with large dimension of In via QTTby tensorization of the matrix to high-order quantized tensorand next by performing QTT decomposition. (b) Distributedrepresentation of a large-scale Nth-order Tucker model via aquantized TT model in which core tensor and all large-scalefactor matrices An (n = 1, 2, . . . , N) are represented by tensortrains [74].

ply a suitable tensorization and quantization of tensordata and then perform approximative decomposition ofthis data into a tensor network and finally perform allcomputations (tensors/matrix/vectors operations, opti-mizations) in tensor network formats.

XII. Conclusions and Future DirectionsTensor networks can be considered as a generalization

and extension of TDs and are promising tools for theanalysis of big data due to their extremely good com-pression abilities and distributed and parallel processing.TDs have already found application in generalized mul-tivariate regression, multi-way blind source separation,sparse representation and coding, feature extraction,classification, clustering and data assimilation. Unique

Page 28: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

28

advantages of tensor networks include potential abilityof tera- or even peta-byte scaling and distributed fault-tolerant computations.

Overall, the benefits of multiway (tensor) analysismethods can be summarized as follows:• “Super” compression of huge multidimensional,

structured data which admits a low-rank approxi-mation via TNs of high-order tensors by extractingfactor matrices and/or core tensors of low-rank andlow-order and perform all mathematical manipu-lations in tensor formats (especially, TT and HTformats).

• A compact and very flexible approximate represen-tation of structurally rich data by accounting fortheir spatio-temporal and spectral dependencies.

• Opportunity to establish statistical links betweencores, factors, components or hidden latent variablesfor blocks of data.

• Possibility to operate with noisy, incomplete, miss-ing data by using powerful low-rank tensor/matrixapproximation techniques.

• A framework to incorporate various diversities orconstraints in different modes and thus naturallyextend the standard (2-way) CA methods to large-scale multidimensional data.

Many challenging problems related to low-rank tensorapproximations remain to be addressed.• A whole new area emerges when several TNs which

operate on different datasets are coupled or linked.• As the complexity of big data increases, this requires

more efficient iterative algorithms for their compu-tation, extending beyond the ALS, MALS/DMRG,SVD/QR and CUR/Cross-Approximation class ofalgorithms.

• Methodological approaches are needed to determinethe kind of constraints that should be imposed oncores to extract desired hidden (latent) variableswith meaningful physical interpretation.

• We need methods to reliably estimate the ranks ofTNs, especially for structured data corrupted bynoise and outliers.

• The uniqueness of various TN models under differ-ent constraints needs to be investigated.

• Special techniques are needed for distributed com-puting and to save and process huge ultra large-scale tensors.

• Better visualization tools need to be developed toaddress large-scale tensor network representations.

In summary, TNs and TDs is a fascinating andperspective area of research with many potentialapplications in multi-modal analysis of massive bigdata sets.

Acknowledgments: The author wish to express hisappreciation and gratitude to his colleagues, especially:Drs. Namgill LEE, Danilo MANDIC, Qibin ZHAO,Cesar CAIAFA, Anh Huy PHAN, Andre ALMEIDA,

Qiang WU, Guoxu ZHOU and Chao LI for reading themanuscript and giving him comments and suggestions.

REFERENCES

[1] A. Cichocki, R. Zdunek, A.-H. Phan, and S. Amari, NonnegativeMatrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Chichester: Wiley,2009.

[2] A. Cichocki, D. Mandic, C. Caiafa, A.-H. Phan, G. Zhou, Q. Zhao,and L. D. Lathauwer, “Multiway Component Analysis: TensorDecompositions for Signal Processing Applications,” (in print),2013.

[3] A. Cichocki, “Tensors decompositions: New concepts for braindata analysis?” Journal of Control, Measurement, and System Inte-gration (SICE), vol. 47, no. 7, pp. 507–517, 2011.

[4] T. Kolda and B. Bader, “Tensor decompositions and applications,”SIAM Review, vol. 51, no. 3, pp. 455–500, September 2009.

[5] P. Kroonenberg, Applied Multiway Data Analysis. New York: JohnWiley & Sons Ltd, 2008.

[6] A. Smilde, R. Bro, and P. Geladi, Multi-way Analysis: Applicationsin the Chemical Sciences. New York: John Wiley & Sons Ltd, 2004.

[7] W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, ser.Springer series in computational mathematics. Heidelberg:Springer, 2012, vol. 42.

[8] S. K. Suter, M. Makhynia, and R. Pajarola, “TAMRESH - tensorapproximation multiresolution hierarchy for interactive volumevisualization,” Comput. Graph. Forum, vol. 32, no. 3, pp. 151–160,2013.

[9] H. Wang, Q. Wu, L. Shi, Y. Yu, and N. Ahuja, “Out-of-core tensorapproximation of multi-dimensional matrices of visual data,”ACM Trans. Graph., vol. 24, no. 3, pp. 527–535, 2005.

[10] A.-H. Phan and A. Cichocki, “PARAFAC algorithms for large-scale problems,” Neurocomputing, vol. 74, no. 11, pp. 1970–1984,2011.

[11] G. Zhou, A. Cichocki, S. Xie, and D. Mandic, “Beyond CanonicalCorrelation Analysis: Common and individual features analysis,”ArXive, 2013. [Online]. Available: http://arxiv.org/abs/1212.3913,2012

[12] T. Yokota, A. Cichocki, and Y. Yamashita, “Linked PARAFAC/CPtensor decomposition and its fast implementation for multi-blocktensor analysis,” in Neural Information Processing. Springer, 2012,pp. 84–91.

[13] P. Comon, X. Luciani, and A. L. F. de Almeida, “Tensor de-compositions, Alternating Least Squares and other Tales,” Jour.Chemometrics, vol. 23, pp. 393–405, 2009.

[14] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “A survey ofmultilinear subspace learning for tensor data,” Pattern Recognition,vol. 44, no. 7, pp. 1540–1551, 2011.

[15] M. Mørup, “Applications of tensor (multiway array) factoriza-tions and decompositions in data mining,” Wiley Interdisc. Rew.:Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 24–40, 2011.

[16] P. Comon and C. Jutten, Eds., Handbook of Blind Source Separation:Independent Component Analysis and Applications. Academic Press,2010.

[17] F. De la Torre, “A least-squares framework for component anal-ysis,” IEEE Transactions Pattern Analysis and Machine Intelligence(PAMI), vol. 34, no. 6, pp. 1041–1055, 2012.

[18] S. Dolgov and D. Savostyanov, “Alternating minimal energymethods for linear systems in higher dimensions. part i: SPDsystems,” arXiv preprint arXiv:1301.6068, 2013.

[19] ——, “Alternating minimal energy methods for linear systems inhigher dimensions. part ii: Faster algorithm and application tononsymmetric systems,” arXiv preprint arXiv:1304.1222, 2013.

[20] J. Ballani, L. Grasedyck, and M. Kluge, “Black box approximationof tensors in hierarchical Tucker format,” Linear Algebra and itsApplications, vol. 438, no. 2, pp. 639–657, 2013.

[21] I. Oseledets and E. Tyrtyshnikov, “Breaking the curse of dimen-sionality, or how to use SVD in many dimensions,” SIAM J.Scientific Computing, vol. 31, no. 5, pp. 3744–3759, 2009.

[22] A. Uschmajew and B. Vandereycken, “The geometry of algorithmsusing hierarchical tensors,” Linear Algebra and its Applications, vol.439, 2013.

[23] L. Grasedyck and W. Hackbusch, “An introduction to hierarchical(h-) rank and tt-rank of tensors with examples,” Comput. Meth. inAppl. Math., vol. 11, no. 3, pp. 291–304, 2011.

Page 29: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

29

[24] C. Tobler, “Low-rank Tensor Methods for Linear Systems andEigenvalue Problems,” Ph.D. dissertation, ETH Zurich, Zurich,Switzerland, 2012.

[25] D. Kressner and C. Tobler, “Algorithm 941: htucker - A MatlabToolbox for Tensors in Hierarchical Tucker Format,” ACM Trans.Math. Softw., vol. 40, no. 3, p. 22, 2014.

[26] L. Grasedyck, D. Kessner, and C. Tobler, “A literature sur-vey of low-rank tensor approximation techniques,” CGAMM-Mitteilungen, vol. 36, pp. 53–78, 2013.

[27] A. Cichocki, “Generalized Component Analysis and Blind SourceSeparation Methods for Analyzing Mulitchannel Brain Signals,”in in Statistical and Process Models for Gognitive Neuroscience andAging. Lawrence Erlbaum Associates, 2007, pp. 201–272.

[28] V. Calhoun, J. Liu, and T. Adali, “A review of group ICA forfMRI data and ICA for joint inference of imaging, genetic, andERP data,” Neuroimage, vol. 45, pp. 163–172, 2009.

[29] G. Zhou, A. Cichocki, and S. Xie, “Fast Nonnegative Ma-trix/Tensor Factorization Based on Low-Rank Approximation,”IEEE Transactions on Signal Processing, vol. 60, no. 6, pp. 2928–2940, June 2012.

[30] F. L. Hitchcock, “Multiple invariants and generalized rank of ap-way matrix or tensor,” Journal of Mathematics and Physics, vol. 7,pp. 39–79, 1927.

[31] R. A. Harshman, “Foundations of the PARAFAC procedure: Mod-els and conditions for an explanatory multimodal factor analysis,”UCLA Working Papers in Phonetics, vol. 16, pp. 1–84, 1970.

[32] J. Carroll and J.-J. Chang, “Analysis of individual differences inmultidimensional scaling via an n-way generalization of ’Eckart-Young’ decomposition,” Psychometrika, vol. 35, no. 3, pp. 283–319,September 1970.

[33] S. Vorobyov, Y. Rong, N. Sidiropoulos, and A. Gershman, “Robustiterative fitting of multilinear models,” IEEE Transactions SignalProcessing, vol. 53, no. 8, pp. 2678–2689, 2005.

[34] A. Cichocki and H. A. Phan, “Fast local algorithms for large scalenonnegative matrix and tensor factorizations,” IEICE Transactionson Fundamentals of Electronics, Communications and Computer Sci-ences, vol. E92-A, no. 3, pp. 708–721, 2009.

[35] A.-H. Phan, P. Tichavsky, and A. Cichocki, “Lowcomplexity Damped Gauss-Newton algorithms forCANDECOMP/PARAFAC,” SIAM Journal on Matrix Analysis andApplications (SIMAX), vol. 34, no. 1, pp. 126–147, 2013.

[36] L. Sorber, M. Van Barel, and L. De Lathauwer, “Optimization-based algorithms for tensor decompositions: Canonical PolyadicDecomposition, decomposition in rank-(Lr , Lr , 1) terms and a newgeneralization,” SIAM J. Optimization, vol. 23, no. 2, 2013.

[37] N. Sidiropoulos and R. Bro, “On the uniqueness of multilineardecomposition of N-way arrays,” J. Chemometrics, vol. 14, no. 3,pp. 229–239, 2000.

[38] M. Sørensen, L. De Lathauwer, P. Comon, S. Icart, and L. Deneire,“Canonical Polyadic Decomposition with orthogonality con-straints,” SIAM J. Matrix Anal. Appl., vol. 33, no. 4, pp. 1190–1213,2012.

[39] G. Zhou and A. Cichocki, “Canonical Polyadic Decompositionbased on a single mode blind source separation,” IEEE SignalProcessing Letters, vol. 19, no. 8, pp. 523–526, 2012.

[40] L. R. Tucker, “Some mathematical notes on three-mode factoranalysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, September1966.

[41] G. Zhou and A. Cichocki, “Fast and unique Tucker decompo-sitions via multiway blind source separation,” Bulletin of PolishAcademy of Science, vol. 60, no. 3, pp. 389–407, 2012.

[42] G. Favier and A. L. F. de Almeida, “Overview of constrainedPARAFAC models,” ArXiv e-prints, May 2014. [Online]. Available:http://adsabs.harvard.edu/abs/2014arXiv1405.7442F

[43] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinearsingular value decomposition,” SIAM Journal of Matrix Analysisand Applications, vol. 24, pp. 1253–1278, 2000.

[44] ——, “On the best rank-1 and rank-(R1,R2,. . .,RN) approximationof higher-order tensors,” SIAM J. Matrix Anal. Appl., vol. 21, pp.1324–1342, March 2000.

[45] Y. Lui, J. Beveridge, and M. Kirby, “Action classification onproduct manifolds,” in Computer Vision and Pattern Recognition(CVPR), 2010 IEEE Conference on. IEEE, 2010, pp. 833–839.

[46] A. Benson, J. Lee, B. Rajwa, and D. Gleich, “Scalable methodsfor nonnegative matrix factorizations of near-separable tall-and-skinny matrices,” ArXiv e-prints, vol. abs/1402.6964, 2014.[Online]. Available: http://arxiv.org/abs/1402.6964

[47] A. Benson, D. Gleich, and J. Demmel, “Direct QR factorizationsfor tall-and-skinny matrices in MapReduce architectures,” ArXive-prints, vol. abs/1301.1071, 2013. [Online]. Available: http://arxiv.org/abs/1301.1071

[48] M. Vasilescu and D. Terzopoulos, “Multilinear analysis of imageensembles: Tensorfaces,” in Proc. European Conf. on ComputerVision (ECCV), vol. 2350, Copenhagen, Denmark, May 2002, pp.447–460.

[49] C. Caiafa and A. Cichocki, “Generalizing the column-row matrixdecomposition to multi-way arrays,” Linear Algebra and its Appli-cations, vol. 433, no. 3, pp. 557–573, 2010.

[50] E. Acar, D. Dunlavy, T. Kolda, and M. Mørup, “Scalable tensorfactorizations for incomplete data,” Chemometrics and IntelligentLaboratory Systems, vol. 106 (1), pp. 41–56, 2011. [Online].Available: http://www2.imm.dtu.dk/pubdb/p.php?5923

[51] S. Gandy, B. Recht, and I. Yamada, “Tensor completion and low-n-rank tensor recovery via convex optimization,” Inverse Problems,vol. 27, no. 2, 2011.

[52] M. W. Mahoney, M. Maggioni, and P. Drineas, “Tensor-CURdecompositions for tensor-based data,” SIAM Journal on MatrixAnalysis and Applications, vol. 30, no. 3, pp. 957–987, 2008.

[53] S. A. Goreinov, N. L. Zamarashkin, and E. E. Tyrtyshnikov,“Pseudo-skeleton approximations by matrices of maximum vol-ume,” Mathematical Notes, vol. 62, no. 4, pp. 515–519, 1997.

[54] S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin, “Atheory of pseudo-skeleton approximations,” Linear Algebra Appl.,vol. 261, pp. 1–21, 1997.

[55] I. Oseledets and E. Tyrtyshnikov, “TT-cross approximation formultidimensional arrays,” Linear Algebra and its Applications, vol.432, no. 1, pp. 70–88, 2010.

[56] M. Mahoney and P. Drineas, “CUR matrix decompositions forimproved data analysis,” Proc. National Academy of Science, vol.106, pp. 697–702, 2009.

[57] A. Phan and A. Cichocki, “Tensor decompositions for feature ex-traction and classification of high dimensional datasets,” NonlinearTheory and Its Applications, IEICE, vol. 1, no. 1, pp. 37–68, 2010.

[58] Q. Zhao, C. Caiafa, D. Mandic, Z. Chao, Y. Nagasaka, N. Fujii,L. Zhang, and A. Cichocki, “Higher-order partial least squares(HOPLS): A generalized multi-linear regression method,” IEEETrans on Pattern Analysis and Machine Intelligence (PAMI), vol. 35,no. 7, pp. 1660–1673, 2013.

[59] I. V. Oseledets, “Tensor-train decomposition,” SIAM J. ScientificComputing, vol. 33, no. 5, pp. 2295–2317, 2011.

[60] B. Khoromskij, “Tensors-structured numerical methods in scien-tific computing: Survey on recent advances,” Chemometrics andIntelligent Laboratory Systems, vol. 110, no. 1, pp. 1–19, 2011.

[61] D. Perez-Garcia, F. Verstraete, M. M. Wolf, and J. I. Cirac,“Matrix product state representations,” Quantum Info. Comput.,vol. 7, no. 5, pp. 401–430, Jul. 2007. [Online]. Available:http://dl.acm.org/citation.cfm?id=2011832.2011833

[62] F. Verstraete, V. Murg, and J. Cirac, “Matrix product states,projected entangled pair states, and variational renormalizationgroup methods for quantum spin systems,” Advances in Physics,vol. 57, no. 2, pp. 143–224, 2008.

[63] R. Orus, “A Practical Introduction to Tensor Networks: MatrixProduct States and Projected Entangled Pair States,” The Journalof Chemical Physics, 2013.

[64] U. Schollwock, “Matrix product state algorithms: DMRG, TEBDand relatives,” in Strongly Correlated Systems. Springer, 2013, pp.67–98.

[65] ——, “The density-matrix renormalization group in the age ofmatrix product states,” Annals of Physics, vol. 326, no. 1, pp. 96–192, 2011.

[66] T. Huckle, K. Waldherr, and T. Schulte-Herbriggen, “Computa-tions in quantum tensor networks,” Linear Algebra and its Appli-cations, vol. 438, no. 2, pp. 750 – 781, 2013.

[67] W. de Launey and J. Seberry, “The strong Kronecker product.” J.Comb. Theory, Ser. A, vol. 66, no. 2, pp. 192–213, 1994. [Online].Available: http://dblp.uni-trier.de/db/journals/jct/jcta66.html#LauneyS94

[68] V. Kazeev, B. Khoromskij, and E. Tyrtyshnikov, “MultilevelToeplitz matrices generated by tensor-structured vectors andconvolution with logarithmic complexity,” SIAM J. Scientific Com-puting, vol. 35, no. 3, 2013.

[69] V. Kazeev, O. Reichmann, and C. Schwab, “Low-rank tensor struc-ture of linear diffusion operators in the TT and QTT formats,”

Page 30: Era of Big Data Processing: A New Approach via Tensor ...Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a

30

Linear Algebra and its Applications, vol. 438, no. 11, pp. 4204–42 221,2013.

[70] G. Vidal, “Efficient classical simulation of slightly entangledquantum computations,” Physical Review Letters, vol. 91, no. 14,p. 147902, 2003.

[71] S. Holtz, T. Rohwedder, and R. Schneider, “The alternating linearscheme for tensor optimization in the tensor train format,” SIAMJ. Scientific Computing, vol. 34, no. 2, pp. 683–713, 2012.

[72] D. Kressner and C. Tobler, “htucker—A MATLAB toolbox fortensors in hierarchical Tucker format,” MATHICSE, EPF Lausanne,2012. [Online]. Available: http://anchp.epfl.ch/htucker

[73] T. Rohwedder and A. Uschmajew, “On local convergence ofalternating schemes for optimization of convex problems in thetensor train format,” SIAM Journal on Numerical Analysis, vol. 51,no. 2, pp. 1134–1162, 2013.

[74] S. Dolgov and B. Khoromskij, “Two-level QTT-Tucker format foroptimized tensor calculus,” SIAM J. Matrix Analysis Applications,vol. 34, no. 2, pp. 593–623, 2013.

[75] I. Oseledets, “TT-toolbox 2.2,” 2012. [Online]. Available: https://github.com/oseledets/TT-Toolbox

[76] L. Grasedyck, “Hierarchical Singular Value Decomposition oftensors,” SIAM J. Matrix Analysis Applications, vol. 31, no. 4, pp.2029–2054, 2010.

[77] M. Espig, M. Schuster, A. Killaitis, N. Waldren, P. Wahnert,S. Handschuh, and H. Auer, “Tensor Calculus library,” 2012.[Online]. Available: http://gitorious.org/tensorcalculus

[78] G. Zhou and A. Cichocki, “TDALAB: Tensor DecompositionLaboratory,” LABSP, Wako-shi, Japan, 2013. [Online]. Available:http://bsp.brain.riken.jp/TDALAB/

[79] A.-H. Phan, P. Tichavsky, and A. Cichocki, “Tensorbox: amatlab package for tensor decomposition,” LABSP, RIKEN,Japan, 2012. [Online]. Available: http://www.bsp.brain.riken.jp/∼phan/tensorbox.php

[80] L. Sorber, M. Van Barel, and L. De Lathauwer, “Tensorlabv1.0,” Feb. 2013. [Online]. Available: http://esat.kuleuven.be/sista/tensorlab/

[81] N. Lee and A. Cichocki, “Fundamental tensor operations forlarge-scale data analysis in tensor train formats,” RIKEN, BrainScience Institute, LABSP, Tech. Rep., 2014. [Online]. Available:http://www.bsp.brain.riken.jp/

[82] Q. Wu and A. Cichocki, “Algorithms for tensor train newtworks,”RIKEN, Brain Science Institute, LABSP, Tech. Rep., 2013. [Online].Available: http://www.bsp.brain.riken.jp/

[83] J. Salmi, A. Richter, and V. Koivunen, “Sequential unfolding SVDfor tensors with applications in array signal processing,” IEEETransactions on Signal Processing, vol. 57, pp. 4719–4733, 2009.

[84] L. De Lathauwer, “Decompositions of a higher-order tensor inblock terms – Part I and II,” SIAM Journal on Matrix Analysis andApplications (SIMAX), vol. 30, no. 3, pp. 1022–1066, 2008, specialIssue on Tensor Decompositions and Applications. [Online].Available: http://publi-etis.ensea.fr/2008/De08e

[85] A. H. Phan, A. Cichocki, P. Tichavsky, D. Mandic, and K. Mat-suoka, “On Revealing Replicating Structures in Multiway Data:A Novel Tensor Decomposition Approach,” in Proc. 10th Interna-tional Conf. LVA/ICA, Tel Aviv, March 12-15,, 2012, pp. 297–305.

[86] S. Ragnarsson, “Structured tensor computations: Blocking, sym-metries and Kronecker factorizations,” PhD Dissertation, CornellUniversity, Department of Applied Mathematics, 2012.

[87] A.-H. Phan, A. Cichocki, P. Tichavsky, R. Zdunek, and S. Lehky,“From basis components to complex structural patterns,” in Proc.of the IEEE International Conference on Acoustics, Speech, and SignalProcessing (ICASSP 2013), Vancouver, Canada, 2013.

[88] L. De Lathauwer, “Blind separation of exponential polynomialsand the decomposition of a tensor in rank-(Lr ,Lr ,1) terms,” SIAMJ. Matrix Analysis Applications, vol. 32, no. 4, pp. 1451–1474, 2011.

[89] L. De Lathauwer and D. Nion, “Decompositions of a higher-order tensor in block terms – Part III: Alternating least squaresalgorithms,” SIAM Journal on Matrix Analysis and Applications(SIMAX), vol. 30, no. 3, pp. 1067–1083, 2008.

[90] I. V. Oseledets and E. E. Tyrtyshnikov, “Algebraic wavelet trans-form via quantics tensor train decomposition,” SIAM J. ScientificComputing, vol. 33, no. 3, pp. 1315–1328, 2011.

[91] I. Oseledets, “Approximation of 2d × 2d matrices using tensordecomposition,” SIAM J. Matrix Analysis Applications, vol. 31,no. 4, pp. 2130–2145, 2010.

[92] B. Khoromskij, “O(d log N)-quantics approximation of N-d ten-sors in high-dimensional numerical modeling,” Constructive Ap-proximation, vol. 34, no. 2, pp. 257–280, 2011.

[93] S. V. Dolgov, B. Khoromskij, I. Oseledets, and D. V. Savostyanov,“Computation of extreme eigenvalues in higher dimensions usingblock tensor train format,” Computer Physics Communications, vol.185, no. 4, pp. 1207–1216, 2014.

[94] V. Kazeev and B. Khoromskij, “Low-rank explicit qtt represen-tation of the laplace operator and its inverse,” SIAM J. MatrixAnalysis Applications, vol. 33, no. 3, pp. 742–758, 2012.

[95] N. Nakatani and G. Chan, “Efficient tree tensor network states(TTNS) for quantum chemistry: Generalizations of the densitymatrix renormalization group algorithm,” The Journal of ChemicalPhysics, 2013.

[96] V. Kazeev, M. Khammash, M. Nip, and C. Schwab, “Directsolution of the Chemical Master Equation using Quantized TensorTrains,” PLOS Computational Biology, March 2014.

[97] D. Kressner, M. Steinlechner, and A. Uschmajev, “Low-rank tensormethods with subspace correction for symmetric eigenvalueproblems,” (in print), 2014. [Online]. Available: http://sma.epfl.ch/∼uschmaje/paper/EVAMEN.pdf