semantic scholar › a5e5 › ac9771d2e3c6...version september 9, 2017 submitted to entropy 3 of 31...
TRANSCRIPT
Article
The Connection between Information Geometry andGeometric Mechanics
Melvin Leok 1,: ID and Jun Zhang 2,:*1 University of California, San Diego; [email protected] University of Michigan, Ann Arbor; [email protected]* Correspondence: [email protected]; Tel.: +1-734-763-6161: These authors contributed equally to this work.
Academic Editor: nameVersion September 9, 2017 submitted to Entropy
Abstract: The divergence function in information geometry, and the discrete Lagrangian in discrete1
geometric mechanics each induce a differential geometric structure on the product manifold QˆQ.2
We aim to investigate the relationship between these two objects, and the fundamental role that3
duality, in the form of Legendre transforms, play in both fields. By establishing an analogy between4
these two approaches, we will show how a fruitful cross-fertilization of techniques arise. In particular,5
we establish, through variational error analysis that the divergence function agrees with exact discrete6
Lagrangian up to third order if and only if Q is a Hessian manifold.7
Keywords: Lagrangian, Hamiltonian, Legendre map, symplectic form, divergence function,8
generating function, Hessian manifold9
1. Introduction10
Information geometry and geometric mechanics each induce geometric structures on an arbitrary11
manifold Q, and we investigate the relationship between these two approaches. More specifically, we12
study the interaction of three objects: TQ, the tangent bundle on which a Lagrangian function Lpq, vq13
is defined; T˚Q, the cotangent bundle on which a Hamiltonian function Hpq, vq is defined; and QˆQ,14
the product manifold on which the divergence function (from the information geometric perspective)15
or the Type I generating function (from the geometric mechanics perspective) is defined. In discrete16
mechanics, while the QˆQ ÐÑ T˚Q correspondence is via a symplectomorphism given by the time-h17
flow map associated with the Hamiltonian Hpq, pq, and the QˆQ ÐÑ TQ correspondence is via the18
map relating the boundary-value and initial-value formulation of the Euler–Lagrange flow, it is the19
correspondence between TQ ÐÑ T˚Q through the fiberwise Legendre map based on L or H that20
actually serves to couple the Hamiltonian flow with Lagrangian flow, leading to one and the same21
dynamics. We propose a decoupling of the Lagrangian and Hamiltonian dynamics through the use22
of a divergence function Dpq, v, pq defined on the Pontryagin bundle TQ‘ T˚Q that measures the23
discrepancy (or duality gap) between Hpq, pq and Lpq, vq. We also establish, through variational error24
analysis that the divergence function agrees with exact discrete Lagrangian up to third order if and25
only if Q is a Hessian manifold.26
Geometric mechanics investigates the equations of motion using the Lagrangian, Hamiltonian,27
and Hamilton–Jacobi formulations of classical Newtonian mechanics. Two apparently different28
principles were used in those formulations: the principle of conservation (energy, momentum, etc)29
leading to Hamiltonian dynamics and the principle of variation (least action) leading to Lagrangian30
dynamics. The conservation properties of the Hamiltonian approach are with respect to the underlying31
symplectic geometry on the cotangent bundle, whereas the variational principles that result in the32
Submitted to Entropy, pages 1 – 31 www.mdpi.com/journal/entropy
Version September 9, 2017 submitted to Entropy 2 of 31
Euler–Lagrange equation of motions and the Hamilton–Jacobi equations reflect the geometry of the33
semispray on the tangent bundle. Lagrangian and Hamiltonian mechanics reflect two sides of the same34
coin – that they describe the identical dynamics on the configuration space (base manifold) is both35
remarkable and also to be expected due to their construction: because the Lagrangian and Hamiltonian36
are dual to each other, and related via the Legendre transform.37
Information geometry [2,3] in the broadest sense of the term, provides a dualistic Riemannian38
geometric structure that is induced by a class of functions called divergence functions, which essentially39
provide a method of smoothly measuring a directed distance between any two points on the manifold,40
where the manifold is the space of probability densities. It arises in various branches of information41
science, including statistical inference, machine learning, neural computation, information theory,42
optimization and control, etc. Various geometric structures can be induced from divergence functions,43
including metric, affine connection, symplectic structure, etc., and this is reviewed in [26]. Convex44
duality and the Legendre transform play a key role in both constructing the divergence function and45
characterizing the various dualities encoded by information geometry [22].46
Given that geometric mechanics and information geometry both prescribe dualistic geometric47
structures on a manifold, it is interesting to explore the extent to which these two frameworks are48
related. In geometric mechanics, the Legendre transform provides a link between the Hamiltonian49
function that is defined on the cotangent bundle T˚Q, with the Lagrangian function that is defined on50
the tangent bundle TQ, whereas in information geometry, it provides a link between the biorthogonal51
coordinates of the base manifold Q if it is dually-flat and exhibits the Hessian geometry. To understand52
their deep relationship, it turns out that we need to resort to the discrete formulation of geometric53
mechanics, and investigate the product manifold Q ˆ Q. The basic tenet of discrete geometric54
mechanics is to preserve the fact that Hamiltonian flow is a symplectomorphism, and construct55
discrete time maps that are symplectic. This results in two ways of viewing discrete-time mechanics,56
either as maps on QˆQ or T˚Q, which are related via discrete Legendre transforms. The shift in focus57
from T˚Q to QˆQ precisely lends itself to establishing a connection to information geometry, as the58
divergence function is naturally defined on QˆQ, and in both information geometry and discrete59
geometric mechanics, induces a symplectic structure on Qˆ Q. This is the basic observation that60
connects geometric mechanics and information geometry, and we will explore the implications of this61
connection in the paper.62
Our paper is organized as follows. Section 2 provides a contemporary viewpoint of geometric63
mechanics, with Lagrangian and Hamiltonian systems discussed in parallel with one another, in64
terms of geometry on TQ and T˚Q, respectively, including a discussion of Dirac mechanics on the65
Pontryagin bundle TQ‘ T˚Q, which provides a unified treatment of Lagrangian and Hamiltonian66
mechanics. Section 3 summarizes the results of the discrete formulation of geometric mechanics, which67
is naturally defined on the product manifold QˆQ. Section 4 is a review of now-classical information68
geometry, including the Riemannian metric and affine connections on TQ, and the manner in which the69
divergence function naturally induces dualistic Riemannian structures. The special cases of Hessian70
geometry and biorthogonal coordinates are highlighted, showing how the Legendre transform is71
essential for characterizing dually-flat spaces. Section 5 starts with a presentation of the symplectic72
structure on QˆQ induced by a divergence function, which is naturally identified with the Type I73
generating function on it. We follow up by investigating its transformation into a Type II generating74
function (which plays a key role in discrete Hamiltonian mechanics). We then propose to decouple75
the discrete Hamiltonian and Lagrangian dynamics by using the divergence function to measure76
their duality gap. Finally, we perform variational error analysis to show that on a dually-flat Hessian77
manifold, the Bregman divergence is third-order accurate with respect to the exact discrete Lagrangian.78
Section 6 closes with a summary and discussion.79
Version September 9, 2017 submitted to Entropy 3 of 31
2. A Review of Geometric Mechanics80
Consider an n-dimensional configuration manifold Q, with local coordinates pq1, . . . , qnq. The81
Lagrangian formulation of mechanics is defined on the tangent bundle TQ, in terms of a Lagrangian82
L : TQ Ñ R. From this, one can construct an action integral S which is a functional of the curve83
q : rt1, t2s Ñ Q, given by84
Spqq “ż t2
t1
Lpqptq, 9qptqq dt.
Then, Hamilton’s variational principle states that,85
δSpqq “ 0, (1)
where the variation δS is induced by an infinitesimal variation δq of the trajectory q, subject to the86
condition that the variations vanish at the endpoints, i.e., δqpt1q “ δqpt2q “ 0. Applying standard87
results from the calculus of variations, we obtain the following Euler–Lagrange equations of motion,88
ddtBLB 9qi ´
BLBqi “ 0. (2)
The Hamiltonian formulation of mechanics is defined on the cotangent bundle T˚Q, and the89
fiberwise Legendre transform, FL : TQ Ñ T˚Q, relates the tangent bundle and cotangent bundle as90
follows,91
pqi, 9qiq ÞÑ pqi, piq “
ˆ
qi,BLB 9qi
˙
,
where pi is the conjugate momentum to qi:92
p “BLB 9q
. (3)
The term fiberwise is used to emphasize the fact that FL establishes a pointwise correspondence between93
TqQ and T˚q Q for any point q on Q. The cotangent bundle forms the phase space, on which one can94
define a Hamiltonian H : T˚Q Ñ R,95
Hpq, pq “ xp, 9qpq, pqy ´ Lpq, 9qpq, pqq,
where 9q is viewed as a function of pq, pq by inverting the Legendre transform (3), and96
xp, vyq “ÿ
i
pivi
denotes the duality or natural pairing between a vector v and covector p at the point q P Q. A97
straightforward calculation shows that98
BHBpi
“ 9qi , (4)
and99BHBqi “ ´
BLBqi .
From these, we transform the Euler–Lagrange equations into Hamilton’s equations,100
dqi
dt“BHBpi
,dpidt“ ´
BHBqi . (5)
Version September 9, 2017 submitted to Entropy 4 of 31
The canonical symplectic form ωcan on T˚M can be identified with a quadratic form induced by the101
skew-symmetric matrix J, i.e., ωcanpv, wq “ vT Jw. With that identification, Hamilton’s equations can102
be expressed as,103«
9q9p
ff
“
«
0 I´I 0
ff «
BHBqBHBp
ff
“ J
«
BHBqBHBp
ff
.
Alternatively, Hamilton’s equations (5) can be derived using Hamilton’s phase space variational104
principle, which states that,105
δ
ż t2
t1
rxp, 9qy ´ Hpq, pqs dt “ 0,
for infinitesimal variations δq that vanish at the endpoints. The infinitesimal variation of the integral106
is computed by differentiating under the integral, integrating by parts, and using the fact that the107
infinitesimal variations δq vanish at the endpoints, which yields:108
ż t2
t1
ˆ
xδp, 9qy ` xp, δ 9qy ´BHBq
δq´BHBp
δp˙
dt “ż t2
t1
„ˆ
´ 9p´BHBq
˙
δq`ˆ
9q´BHBp
˙
δp
dt,
and by the fundamental theorem of the calculus of variations, which states that the integral is stationary109
only when the terms in the parentheses multiplying into the independent variations δq and δp vanish,110
we recover Hamilton’s equations (5).111
Lagrangian and Hamiltonian mechanics are typically viewed as different representations of the112
same dynamical system, with the Legendre transform relating the two formulations. Here, the Legendre113
transform FL (with FH as its inverse) refers to both the map relating two sets of variables, pq, 9qq with114
pq, pq, as well as the relationship between two functions, the Lagrangian Lpq, 9qq and the Hamiltonian115
Hpq, pq. The Legendre transform links pairs of convex conjugate functions; in classical mechanics,116
the Lagrangian L and Hamiltonian H are always related in this sense of forming a convex pair. The117
requirement that Lpq, 9qq be strictly convex in the variable 9q is referred to as hyperregularity. When118
the Lagrangian is positive homogeneous (or singular), the Legendre transform yields a Hamiltonian119
function that is identically zero, which means that in such cases, the Hamiltonian analogue of the120
Lagrangian system does not exist, which is problematic in the context of analytic mechanics. In order121
to address such degeneracy, it is necessary to consider Dirac mechanics on Dirac manifolds, which is a122
simultaneous generalization of Lagrangian and Hamiltonian mechanics.123
In geometric mechanics, including the contemporaneous Dirac formulation, the Lagrangian L124
and Hamiltonian H are always coupled via the fiberwise Legendre transform FL. In information125
geometry, it is a well-known fact that one can construct the divergence function (to be defined later),126
which captures the departure from such perfect coupling. In other words, we can view Lagrangian and127
Hamiltonian systems as two separate systems, which are endowed with their own dynamics and are in128
some sense dual to each other, and we then use the divergence function to measure their duality gap.129
For this reason, we will review the Lagrangian and Hamiltonian formulation of mechanics in terms130
of Lpq, 9qq and Hpq, pq, respectively, without necessarily assuming that the Lagrangian and Hamiltonian are131
related by the Legendre transform.132
2.1. Lagrangian mechanics as an extremization system on TQ133
As noted previously, the Euler–Lagrange equations (2) arise from the stationarity conditions that134
describe the extremal curves of the action integral, over the class of varied curves that fix the endpoints.135
Carrying out the differentiation in (2) explicitly yields,136
ÿ
j
B2LBviBvj
dvj
dt`ÿ
j
vj B2L
BqjBvi ´BLBqi “ 0. (6)
Version September 9, 2017 submitted to Entropy 5 of 31
The fundamental tensor gij associated with the Lagrangian Lpq, vq is given by,137
gijpq, vq “B2Lpq, vqBviBvj ,
which is assumed to be positive-definite, i.e., the Lagrangian L is hyperregular. Let gil denote the138
matrix inverse of gij, then (6) can be written as139
d2qi
dt2 ` 2Giˆ
q,dqdt
˙
“ 0, (7)
where140
Gipq, vq “ÿ
l
gil
2
˜
ÿ
k
B2LBqkBvl vk ´
BLBql
¸
.
So, (7) with the above Gi are Euler–Lagrange equations in disguise, and its solution is an extremal141
curve of the action integral.142
Recall that a smooth curve on Q can be lifted to a curve on TQ in a natural way: a curve143
t ÞÑ qptq P Q becomes t ÞÑ pqptq, 9qptqq P TQ. Given an arbitrary Gipq, vq, the system of equations (7)144
specify a family of curves, called a semispray. As seen above, semisprays arise naturally in variational145
calculus as extremal curves of the action integral associated with a Lagrangian.146
Semisprays can be more generally described by a vector field. Recall that a vector field on Q is147
a section of TQ. Now, consider a vector field on the tangent bundle TQ; it is a section of the double148
tangent bundle TpTQq. The integral surfaces of the semispray induces a decomposition of the total149
space TpTQq into the horizontal subspace HpTQq and the vertical subspace VpTQq, which defines an150
Ehresmann connection. A vector on TQ encodes the second-order derivative of curves on Q, and a151
semispray defines the following vector field V on TQ:152
Vpq, vq “ÿ
i
˜
vi B
Bqi
ˇ
ˇ
ˇ
ˇ
pq,vq´ 2Gipq, vq
B
Bvi
ˇ
ˇ
ˇ
ˇ
pq,vq
¸
,
where the factor ´2 is there by convention. The integral curve of the semispray satisfies the153
second-order ODE (7), and we say that a semispray is a vector field on the tangent bundle TQ154
which encodes a second-order system of differential equations on the base manifold Q.155
A semispray is called a full spray if the spray coefficients Gi satisfy156
Gipq, λvq “ λ2Gipq, vq,
for λ ą 0. In this case, the integral curve remains invariant under reparameterization by a positive157
number, i.e., it satisfies homogeneity. When the semispray becomes a (full) spray, the Lagrange158
geometry becomes Finsler geometry, and the fundamental tensor gijpq, vq becomes the Finsler–Riemann159
metric tensor (which includes the Riemann metric as a special case).160
As noted above, a semispray induces an Ehresmann connection on Q and this connection is161
torsion-free and typically nonlinear. Conversely, given a torsion-free connection, one can construct a162
semispray. The connection is homogenous if and only if the semispray is a full spray. Moreover, if the163
spray is affine, then the connection is affine as well – an affine spray Gi takes the form164
Gi “12
ÿ
jk
Γijkpqq vj vk,
where Γijk is referred to as the affine connection.165
Version September 9, 2017 submitted to Entropy 6 of 31
To summarize, Lagrangian dynamics is related to action minimization by the Euler operator, and166
leads to a semispray on the configuration manifold Q. Under suitable conditions, the Lagrangian167
function Lpq, vq defined on TQ will lead to a torsion-free but generally nonlinear connection, and an168
affine connection only for a very special form of Lagrangian.169
2.2. Hamiltonian as a conservative system on T˚Q170
Given a Hamiltonian H : T˚Q Ñ R, we consider the Hamiltonian vector field XH P XpT˚Qq defined171
by172
XH “
ˆ
BHBpi
,´BHBqi
˙
. (8)
It is straightforward to verify that H “ const along the dynamical flow of XH :173
dHdt“ÿ
i
ˆ
BHBqi
dqi
dt`BHBpi
dpidt
˙
“ 0.
So, a Hamiltonian vector field XH advects the Hamiltonian H along its flow, so that H is constant174
along solution curves, which implies that the Lie derivative L of H along the flow of XH vanishes,175
LXH H “ 0.
Formally, starting from the tautological 1-form θ “ř
i pi dqi on Q, one obtains a 2-form ω, called176
the Poincaré 2-form,177
ω “ ´dθ “ÿ
i
dqi ^ dpi,
which is the canonical symplectic form on T˚Q:178
ωpX, Yq “ÿ
i
pdqi ^ dpiqpX, Yq “ÿ
i
ˆ
BXBqi
BYBpi
´BXBpi
BYBqi
˙
,
where X, Y are vector fields on T˚Q.179
More generally, given a Hamiltonian H along with a symplectic form ω, which is, by definition, a180
closed, nondegenerate 2-form, one obtains the Hamiltonian vector field XH on T˚Q, defined in abstract181
notation by182
ιXH ω “ dH, (9)
or equivalently in a more familiar notation,183
ωpXH , ¨q “ dHp¨q.
One can define the Poisson bracket t¨, ¨u of two functions F and G by using their respective184
Hamiltonian vector fields and the symplectic form,185
tF, Gu ” ωpXF, XGq.
For the canonical symplectic form, it has the following coordinate expression,186
tF, Gu “ÿ
i
ˆ
BFBqi
BGBpi
´BFBpi
BGBqi
˙
.
In this way, Hamilton’s equations can be expressed in terms of the Poisson bracket as follows,187
dqi
dt“ tqi, Hu,
dpi
dt“ tpi, Hu. (10)
Version September 9, 2017 submitted to Entropy 7 of 31
By Darboux’s theorem, it is always possible to choose local coordinates pq, pq on T˚Q, referred to188
as canonical coordinates, such that the symplectic form has the expression ω “ř
i dqi ^ dpi. In these189
coordinates, Hamilton’s equations defined by (9) and (10) take the form given in (8).190
Note that any smooth function H on T˚Q induces a Hamiltonian vector field. An arbitrary vector191
field X on T˚Q is locally Hamiltonian, i.e., induced by a smooth function H, if ιXω is closed, i.e.,192
LXω “ 0. In addition, a Hamiltonian vector field preserves the volume form Ω, i.e.,193
LXpΩq “ 0,
where Ω is the n-fold exterior product of ω,194
Ω “p´1q
npn´1q2
n!ω^ω ¨ ¨ ¨ ^ω.
2.3. Symplectic maps and symplectic flows195
A symplectic map is a diffeomorphism of T˚Q that preserves its symplectic structure ω. We first196
consider a one-parameter family of symplectic maps generated by the flow map FX : T˚Q Ñ T˚Q of197
a vector field X P XpT˚Qq (where X denotes a section). Since the entire family of symplectic maps198
leave ω invariant, it follows that £Xω “ 0. It can be shown (using Cartan’s magic formula, and the fact199
that ω is closed) that a vector field X P XpT˚Qq is symplectic if iXω is closed, i.e., dpiXωq “ 0. By the200
Poincaré lemma, this implies that iXω is locally exact, that is, in the neighborhood of any point, there201
exists some function H : T˚Q Ñ R such that iXω “ dH. So there is always locally exists a Hamiltonian202
H : T˚Q Ñ R that generates a vector field X whose flow is symplectic with respect to ω.203
More generally, a diffeomorphism φ : M1 Ñ M2 is a symplectic map from a symplectic space204
pM1, ω1q to another space pM2, ω2q if:205
φ˚ω2 “ ω1, (11)
where ω1, ω2 are the symplectic forms on M1, M2, respectively. The above condition (11) holds if and206
only if for any functions f , g:207
(i) t f , gu ˝ φ “ t f ˝ φ, g ˝ φu,208
(ii) φ˚XH “ XH˝φ.209
With respect to Darboux coordinates about a point z P M1, the condition (11) that a map φ : M1 Ñ M2210
is symplectic can be expressed locally by pDzφqT JpDzφq “ J, where DzF denotes the Jacobian of φ at z.211
A canonical transformation of T˚M is an automorphism φ : T˚Q Ñ T˚Q,212
φ :
#
pq, pq Ñ pq1qipq1, ¨ ¨ ¨ , qn, p1, ¨ ¨ ¨ , pnq,
pq, pq Ñ pp1qipq1, ¨ ¨ ¨ , qn, p1, ¨ ¨ ¨ , pnq,
such that213
ω “ÿ
i
dqi ^ dpi “ÿ
i
dq1i ^ dp1i.
The significance of canonical transformations is that they preserve the form of Hamilton’s equations,214
and one can check that an automorphism φ is canonical by verifying that pDzφqT JpDzφq “ J in a215
Darboux coordinate chart.216
2.4. Symplectic structure on TQ pulled back from T˚Q217
If we endow T˚Q with the canonical symplectic form, we can construct a symplectic form on TQ218
in such a way that these two spaces are symplectomorphic.219
Version September 9, 2017 submitted to Entropy 8 of 31
The mapping between TQ and T˚Q can be constructed in two different ways, Case I involves the220
Legendre transform:221
TQ ÐÑ T˚Qpq, vq ÞÑ
´
q, BLpq,vqBv
¯
´
q, BHpq,pqBp
¯
Ð[ pq, pq
(12)
and Case II involves the Riemannian metric tensor g (on Q):222
TQ ÐÑ T˚Qpq, vq ÞÑ
´
q,ř
j gijvj¯
´
q,ř
j gij pj
¯
Ð [ pq, pq
(13)
Note that we say that g is a Riemannian metric on Q when g acts on a pair of tangent vectors at the223
tangent space TqQ at a point q of Q; it can be viewed as a p0, 2q-tensor that maps TQˆ TQ Ñ R. On224
the other hand, the symplectic form ω is a skew-symmetric p0, 2q-tensor that acts on a pair of tangent225
vectors on T˚Q, so it maps TpT˚Qq ˆ TpT˚Qq Ñ R.226
Case I. Given the Lagrangian L : TQ Ñ R, this induces the fiberwise Legendre transform227
FL : TQ Ñ T˚Q, which is given by pq, vq ÞÑ pq, pq “´
q, BLBv
¯
. If L is hyperregular, then this map is a228
diffeomorphism. If we endow TQ with the pullback symplectic form pFLq˚ω, which is given by229
ÿ
ij
B2LBvjBvi dqi ^ dvj `
ÿ
ij
B2LBqiBvj dqi ^ dqj,
then the Legendre transform is a symplectomorphism (by construction).230
Case II. The Riemannian metric g induces the musical isomorphisms 5 : TQ Ñ T˚Q and 7 : T˚Q Ñ231
TQ between TQ and T˚Q, which are the operations that lower and raise the index, respectively. If we232
endow TQ with the pullback symplectic form 5˚ω, which is given by233
ω “ dq^ dppq, vq “ÿ
i,j
gijdqi ^ dvj `ÿ
ijk
Bgij
Bqk vi dqj ^ dqk,
then the musical isomorphism is a symplectomorphism (by construction).234
Link between Case I and Case II. It is possible that the two ways of identifying TQ Ø T˚Q may235
be the same; this happens when g on TQ coincides with the second derivatives of Lpq, vq with respect236
to the v-variable:237
gijpq, vq “B2Lpq, vqBviBvj
,
assuming L is hyperregular. The inverse of g, denoted g, can be obtained from238
gijpq, pq “B2Hpq, pqBpiBpj
,
using the Hamiltonian Hpq, pq defined on T˚Q. Note that when the Lagrangian has the form Lpq, vq “239
12 vT Mpqqv ´ Vpqq, this corresponds to the Riemannian metric g being given by the kinetic energy240
metric Mpqq.241
Version September 9, 2017 submitted to Entropy 9 of 31
2.5. Hamilton-Jacobi theory and Dirichlet-to-Neumann map242
In classical mechanics, the Hamilton–Jacobi equation is first introduced as a partial differential243
equation that the action integral satisfies. Recall that the action integral S along the solution of the244
Euler–Lagrange equation (2) over the time interval rt0, ts is245
Sq0pqptqq :“ż t
t0
Lpqpsq, 9qpsqqds, where qp¨q is the solution of the Euler–Lagrange equation. (14)
This is referred to as Jacobi’s solution of the Hamilton–Jacobi equation. Here, we assume that the initial246
position qp0q is fixed and the final position qptq depends on the initial velocity v0 “ 9qp0q. By taking a247
variation δqptq of the endpoint qptq, one obtains a partial differential equation satisfied by Spq, tq:248
Hˆ
q,BSBq
˙
“ 0. (15)
This is the Hamilton–Jacobi equation, when H does not explicit depend on t.249
Conversely, it is shown that if Sq0pqq is a solution of the Hamilton–Jacobi equation then Sq0pqq is250
a generating function for the family of canonical transformations (or symplectic flows) that describe251
the dynamics defined by Hamilton’s equations. This result is the theoretical basis for the powerful252
technique of exact integration called separation of variables.253
There are two uses of Sq0pqq. First, it serves to characterize the Dirichlet-to-Neumann map, which254
refers to the correspondence between the boundary data pq0, qq P QˆQ with the initial data pq0, v0q P255
TQ of a dynamical system. Second, it provides a foliation of the configuration space Q, around the256
point q0 and parameterized by t, that is defined by the condition Sq0pqq “ const.257
In the rest of the paper, we will view Sq0pqq as a scalar-valued function of pq0, qptqq, which we258
refer to as the exact discrete Lagrangian LEd : QˆQ Ñ R,259
LEd pq0, q1; hq “ ext
qPC2pr0,hs,Qqqp0q“q0,qphq“q1
ż h
0Lpqptq, 9qptqqdt. (16)
This is equivalent to the expression for Jacobi’s solution, as the stationarity conditions of this variational260
characterization are simply the Euler–Lagrange equations. Furthermore, this characterization has the261
added benefit that it is well-defined even if the Lagrangian is degenerate. The exact discrete Lagrangian262
provides us with the time-h flow map for the Euler–Lagrange equation. Given a fixed initial point q0,263
this defines a map which takes q P Q to an initial velocity v0, such that the Euler–Lagrange trajectory264
qptqwith initial condition pqp0q, 9qp0qq “ pq0, v0q has boundary values pqp0q, qphqq “ pq0, qq. This is the265
Dirichlet-to-Neumann map QˆQ Ø TQ, pq0, qq ÞÑ pq0, v0q.266
To address the Dirichlet-to-Neumann map more generally, let us first recall the definition of a267
retraction:268
Definition 2.1 ([1], Definition 4.1.1 on p. 55). A retraction on a manifold Q is a smooth mapping R : TQ Ñ269
Q with the following properties: Let Rq : TqQ Ñ Q be the restriction of R to TqQ for an arbitrary q P Q; then,270
(i) Rqp0qq “ q, where 0q denotes the zero element of TqQ;271
(ii) with the identification T0qpTqQq » TqQ, Rq satisfies272
T0qRq “ idTqQ, (17)
where T0qRq is the tangent map of Rq at 0q P TqQ.273
Version September 9, 2017 submitted to Entropy 10 of 31
Eq. (17) implies that the map Rq : TqQ Ñ Q is invertible in some neighborhood of 0q in TqQ. Its274
inverse is conveniently denoted as R : TQ Ñ QˆQ, which is defined by275
Rpvqq :“ pq,Rqpvqqq. (18)
It is easy to see that R : TQ Ñ QˆQ is also invertible in some neighborhood of 0q P TQ for any q P Q.276
Let us introduce a special class of coordinate charts that are compatible with a given retraction277
map R : TQ Ñ Q. A coordinate chart pU, ϕq with U an open subset in Q and ϕ : U Ñ Rn is said to be278
retraction compatible at q P U if279
(i) ϕ is centered at q, i.e., ϕpqq “ 0;280
(ii) the compatibility condition281
Rpvqq “ ϕ´1 ˝ Tq ϕpvqq (19)
holds, where we identify T0Rn with Rn as follows: Let ϕ “ pq1, . . . , qnq with qi : U Ñ R for282
i “ 1, . . . , n. Then283
vi B
Bqi ÞÑ pv1, . . . , vnq, (20)
where BBqi is the unit vector in the qi-direction in T0Rn.284
An atlas for the manifold Q is retraction compatible if it consists of retraction compatible coordinate285
charts.286
In Eq. (19), we assumed that Tq ϕpvqq P ϕpUq Ă Rn and so strictly speaking Rq is defined on287
pTq ϕq´1pϕpUqq Ă TqQ. However, it is always possible to define a coordinate chart such that ϕpUq “ Rn288
by stretching out the open set ϕpUq to Rn so that (19) is defined for any vq P TqQ.289
Retraction maps provide general means to relate Q ˆ Q to TQ: in essence it provides a290
correspondence between tqu ˆQ and TqQ for all q P Q (we may take q P Q to mean the projection of291
QˆQ onto either the first or the second slot).292
2.6. Variational mechanics and the Pontryagin bundle293
Lagrangian and Hamiltonian mechanics can be combined into Dirac mechanics, which is described294
on the Pontryagin bundle TQ‘ T˚Q, which has position, velocity, and momentum as local coordinates.295
Just as the Euler–Lagrange equations of motion arises out of Hamilton’s principle, Hamilton’s296
equations can also arise from Hamilton’s phase space principle:297
δ
ż t2
t1
rxp, 9qy ´ Hpq, pqs dt “ 0.
On the Pontryagin bundle TQ‘ T˚Q, which has local coordinates pq, v, pq, a relaxation of Hamilton’s298
principle (1) is the Hamilton–Pontryagin variational principle, which uses a Lagrange multiplier p to299
impose the second-order condition v “ 9q,300
δ
ż t2
t1
`
Lpq, vq ´ xp, 9q´ vyq˘
dt “ 0. (21)
This encapsulates both Hamilton’s and Hamilton’s phase space variational principles, as well as the301
Legendre transform, and gives the implicit Euler–Lagrange equations,302
9q “ v, 9p “BLBq
, p “BLBv
.
The last equation explicitly imposes the primary constraint condition, which is important when303
describing degenerate Lagrangian systems, such as electrical circuits. Note that the p are interpreted as304
Lagrange multipliers (Tulczyjew and Urbanski, 1999) in addition to its usual interpretation as conjugate305
Version September 9, 2017 submitted to Entropy 11 of 31
momenta. The three equations can be combined by eliminating v and p to recover the Euler–Lagrange306
equations.307
An important application of Hamilton–Jacobi theory is in optimal control theory. Consider a308
typical optimal control problem,309
minup¨q
ż T
0Lpq, uq dt,
subject to the constraints,310
9q “ f pq, uq,
and the boundary conditions qp0q “ q0 and qpTq “ qT . We convert constrained optimization to311
unconstrained optimization by using Lagrange multipliers p (sometimes called the costate or auxiliary312
variables), and we can define the augmented cost functional:313
Srus :“ż T
0tLpq, uq ` xp, 9q´ f pq, uqyu dt “
ż T
0
xp, 9qy ´ Hpq, p, uq(
dt,
where we introduced the costate variables p, and also defined the control Hamiltonian,314
Hpq, p, uq :“ xp, f pq, uqy ´ Lpq, uq.
The variables pq, pq forms a Hamiltonian system, so we impose the optimality condition,315
BHBupq, p, uq “ 0,
to obtain the equation for the optimal control u “ u˚pq, pq, and we obtain the Hamiltonian,316
Hpq, pq :“ maxu
Hpq, p, uq “ Hpq, p, u˚pq, pqq.
We also define the optimal cost-to-go function,317
Cpq, tq :“ż T
t
!
Lpq, u˚q ` xp, 9q´ f pq, u˚qy)
ds
“
ż T
t
!
xp, 9qy ´ Hpq, pq)
ds “ S˚ ´ Spq, tq,
where pqpsq, ppsqq for s P r0, Ts is the solution of Hamilton’s equations with the above H such that318
qptq “ q; and S˚ is the optimal cost319
S˚ :“ż T
0
!
xp, 9qy ´ Hpq, pq)
ds “ż T
0
!
xp, 9qy ´ Hpq, p, u˚pq, pqq)
ds “ Sru˚s,
and the function Spq, tq is defined by320
Spq, tq :“ż t
0
!
xp, 9qy ´ Hpq, pq)
ds.
Since this definition coincides with (14), the function Spq, tq “ S˚´Cpq, tq satisfies the Hamilton–Jacobi321
equation (15); this reduces to the Hamilton–Jacobi–Bellman (HJB) equation for the optimal cost-to-go322
function Cpq, tq:323
BCBt`min
u
"B
BCBq
, f pq, uqF
` Lpq, uq*
“ 0. (22)
It can also be shown that the costate p of the optimal solution is related to the solution of the324
Hamilton–Jacobi–Bellman equation.325
Version September 9, 2017 submitted to Entropy 12 of 31
3. Discrete Formulation of Geometric Mechanics326
In this section, we review various schemes for discretizing mechanics (see, e.g., [15]). Geometric327
mechanics focuses on the differential geometric structure of the configuration manifolds, the associated328
symplectic and Poisson structures on the phase space, and the conservation laws generated by329
symmetries, and geometric structure-preserving numerical integration aims to preserve as many330
of these geometric properties as possible under discretization. The main idea is to start from the331
canonical symplectic form ω on T˚Q, and look at the symplectomorphisms that preserve ω or its332
pullback via the Legendre transforms to QˆQ or TQ.333
3.1. Symplectomorphisms from T˚Q to TQ and to QˆQ334
Given a cotangent bundle T˚Q with a symplectic form ω, we wish to endow the bundles TQ and335
QˆQ with a symplectic structure. Given a function L : TQ Ñ R, the Legendre transform is viewed as336
the fiber derivative FL : TQ Ñ T˚Q, pq, 9qq ÞÑ pq, BLB 9qq. The pullback of ω with respect to FL yields a337
symplectic structure ωL “ FL˚ω on TQ.338
Similarly, given a function Ld : QˆQ Ñ R, we define two discrete fiber derivatives, FL˘d : QˆQ Ñ339
T˚Q, which serve as discrete Legendre transforms:340
FL`d pqk, qk`1q “ pqk`1, D2Ldpqk, qk`1qq, (23)
FL´d pqk, qk`1q “ pqk,´D1Ldpqk, qk`1qq. (24)
Here D1, D2 refers to taking a derivative with respect to the first or second slot, respectively:341
D1Ldpqk, qk`1q “BLdpqk, qk`1q
Bqk, D2Ldpqk, qk`1q “
BLdpqk, qk`1q
Bqk`1.
The two choices of discrete fiber derivatives correspond to whether one views QˆQ as a bundle over342
Q with respect to π´ : pqk, qk`1q ÞÑ qk or π` : pqk, qk`1q ÞÑ qk`1, i.e., projection onto the first or the343
second slot. These induce symplectic structures ω˘Ld“ pFL˘d q
˚ω on QˆQ by pullback.344
pTQ, ωLq //
FL
pTQ, ωLq
FL
pT˚Q, ωq
F //
FL´d
~~
FL`d
pT˚Q, ωq
FL´d
~~
FL`d
pQˆQ, ω´Ld
q99
pQˆQ, ω`Ldq
99pQˆQ, ω´Ld
q pQˆQ, ω`Ldq
345
Let F : T˚Q Ñ T˚Q be a symplectic map and let the maps denoted by the dotted arrows in the346
figure above be defined by requiring that the diagram commutes. Then, these maps are also symplectic347
maps, and the fiber derivative FL is a symplectomorphism between pTQ, ωLq and pT˚Q, ωq, and the348
discrete fiber derivatives FL˘d are symplectomorphisms between pQˆQ, ω˘Ldq and pT˚Q, ωq.349
Version September 9, 2017 submitted to Entropy 13 of 31
3.2. Discrete Lagrangian mechanics350
The aim of geometric structure-preserving numerical integration is to preserve as many geometric351
conservation laws as possible under discretization. Discrete variational mechanics [15] is based on the352
discrete Hamilton’s principle,353
δÿn´1
i“0Ldpqi, qi`1q “ 0, (25)
where the endpoints q0 and qn are fixed, and the discrete Lagrangian, Ld : Q ˆ Q Ñ R, is a Type I354
generating function of the symplectic map. Recall that there exists an exact discrete Lagrangian (16) LEd ,355
that generates the exact time-h flow of a Lagrangian system, but it cannot be computed in general. One356
possible method of constructing computable discrete Lagrangians is the Galerkin approach, which357
involves replacing the infinite-dimensional function space C2pr0, hs, Qq and the integral in (16) with a358
finite-dimensional function space and a quadrature formula, respectively. Below are two examples of359
discrete Lagrangians:360
(i) Symplectic midpoint integrator361
Ldpq0, q1, hq “ hLˆ
q0 ` q1
2,
q1 ´ q0
2
˙
.
This can be obtained from the Galerkin approach by considering the family of linear polynomials362
as the finite-dimensional function space, and the midpoint rule as the quadrature formula.363
(ii) Störmer–Verlet integrator364
Ldpq0, q1, hq “h2
„
Lˆ
q0,q1 ´ q0q
2
˙
` Lˆ
q1,q1 ´ q0q
2
˙
.
This can be obtained from the Galerkin approach by considering the family of linear polynomials365
as the finite-dimensional function space, and the trapezoidal rule as the quadrature formula.366
Performing variational calculus on the discrete Hamilton’s principle (25) yields the discrete367
Euler–Lagrange (DEL) equations,368
D2Ldpqk´1, qkq `D1Ldpqk, qk`1q “ 0. (26)
The above equation implicitly defines the discrete Lagrangian map FLd : pqk´1, qkq ÞÑ pqk, qk`1q at points369
sufficiently close to the diagonal of QˆQ. This is equivalent to the implicit discrete Euler–Lagrange370
(IDEL) equations,371
pk “ ´D1Ldpqk, qk`1q, pk`1 “ D2Ldpqk, qk`1q, (27)
which is precisely the characterization of a symplectic map in terms of Type I generating function. It372
implicitly defines the discrete Hamiltonian map FLd : pqk, pkq ÞÑ pqk`1, pk`1q, and it is symplectic with373
respect to the canonical symplectic form ωcan on T˚Q, i.e., F˚Ldωcan “ ωcan.374
The two discrete fiber derivatives FL˘d induce a single unique discrete symplectic form ωLd “ ω˘Ld“375
pFL˘d q˚ωcan on QˆQ,376
ωLdpqk, qk`1q “B2Ld
BqkBqk`1pqk, qk`1qdqk ^ dqk`1, (28)
and the discrete Lagrangian map is symplectic with respect to ωLd on QˆQ, i.e., F˚LdωLd “ ωLd .377
The discrete Lagrangian and Hamiltonian maps can be expressed in terms of the discrete fiber378
derivatives, FLd “ FL`d ˝ pFL´d q´1, and FLd “ pFL´d q
´1 ˝FL`d , respectively. This characterization of the379
discrete flow maps underlies the proof of the variational error analysis theorem.380
Version September 9, 2017 submitted to Entropy 14 of 31
These properties may be summarized in the following commutative diagram,381
pTQ, ωLq //
FL
pTQ, ωLq
FL
pT˚Q, ωcanq
FLd //
FL´d
zz
FL`d
$$
pT˚Q, ωcanq
FL´d
zz
FL`d
$$pQˆQ, ωLdq FLd
// pQˆQ, ωLdq FLd
// pQˆQ, ωLdq
If the exact discrete Lagrangian LEd is used, then the discrete Hamiltonian map FLd is equal to382
the time-h flow map of Hamilton’s equations, and the dotted arrow is the time-h flow map of the383
Euler–Lagrange equations.384
The variational integrator approach to constructing symplectic integrators simplifies the numerical385
analysis of these methods. In particular, the task of establishing the geometric conservation properties386
and order of accuracy of the discrete Lagrangian map FLd reduces to the simpler task of verifying387
certain properties of the discrete Lagrangian instead.388
3.3. Discrete Hamilton–Jacobi formulation389
In the context of discrete variational mechanics, discrete Hamilton–Jacobi theory can be viewed390
as a composition theorem which relates the composition of symplectic maps generated by a Type II391
generating function H`d pqk, pk`1qwith a symplectic map generated by a Type I generating function392
Skdpq0, qkq. By convention, the first argument q0 in the composition generating function is typically393
omitted, and we simply consider it to be a function Skdpqkq of the final position qk.394
The right discrete Hamiltonian, H`d pqk, pk`1q, is related to the discrete Lagrangian by the Legendre395
transform,396
H`d pqk, pk`1q “ xpk`1, qk`1y ´ Ldpqk, qk`1q,
where we impose the condition that pk`1 “ D2Ldpqk, qk`1q. Equivalently, this can be characterized397
variationally by H`d pqk, pk`1q “ extpk rxpk`1, qk`1y ´ Ldpqk, qk`1qs. This leads to a discrete Hamilton’s398
principle in phase space,399
δn´1ÿ
i“0
xpi`1, qi`1y ´ H`d pqi, pi`1q(
“ 0,
which yields the right discrete Hamilton’s equations,400
qk`1 “ D2H`d pqk, pk`1q, pk “ D1H`d pqk, pk`1q, (29)
which is precisely the characterization of a symplectic map in terms of Type II generating function.401
The continuous Hamilton–Jacobi equation can be derived by considering the evolution properties402
of Jacobi’s solution, which is the action integral evaluated along the solution of the Euler–Lagrange403
equations. One can derive a discrete Hamilton–Jacobi theory by considering a discrete analogue of404
Jacobi’s solution, expressed in terms of the right discrete Hamiltonian,405
Skdpqkq ”
k´1ÿ
i“0
pi`1 ¨ qi`1 ´ H`d pqi, pi`1q(
,
which we evaluate along a solution of the right discrete Hamilton’s equations (29). From this, we have,406
Sk`1d pqk`1q ´ Sk
dpqkq “ pk`1 ¨ qk`1 ´ H`d pqk, pk`1q, (30)
Version September 9, 2017 submitted to Entropy 15 of 31
where pk`1 is considered to be a function of qk and qk`1. Taking derivatives with respect to qk`1, we407
obtain,408
DSk`1d pqk`1q “ pk`1 `
Bpk`1Bqk`1
¨“
qk`1 ´D2H`d pqk, pk`1q‰
,
but the term inside the parenthesis vanishes as we are restricting this to a solution of the right discrete409
Hamilton’s equations. Therefore, we have that410
DSk`1d pqk`1q “ pk`1,
which when substituted into (30) yields the discrete Hamilton–Jacobi equation,411
Sk`1d pqk`1q ´ Sk
dpqkq “ DSk`1d pqk`1q ¨ qk`1 ´ H`d pqk, DSk`1
d pqk`1qq.
3.4. Discrete Hamilton-Pontryagin principle412
Leok and Ohsawa [13] considered the discrete Hamilton’s principle and relaxed the discrete413
second-order condition,414
q1k “ q0
k`1,
and reimposed it using Lagrange multipliers pk`1, in order to derive the discrete Hamilton–Pontryagin415
principle on pQˆQq ˆQ T˚Q,416
δ”
ÿn´1
i“0Ldpq0
i , q1i q `
ÿn´2
i“0pi`1pq0
i`1 ´ q1i qı
“ 0. (31)
Here, the superscripts 0, or 1 on qi refers to the first or second slot, respectively, in QˆQ. This in turn417
yields the implicit discrete Euler–Lagrange equations,418
q1k “ q0
k`1, pk`1 “ D2Ldpq0k , q1
kq, pk “ ´D1Ldpq0k , q1
kq, (32)
where D1, D2 denote as before the partial derivative with respect to the first or second argument in419
Ld. Making the identification qk “ q0k “ q1
k´1, the last two equations define the discrete fiber derivatives,420
FL˘d : QˆQ Ñ T˚Q as given by (23) and (24). Discrete fiber derivatives induce a discrete symplectic form,421
ωLd ” pFL˘d q˚ωcan, and the discrete Lagrangian map FLd ” pFL´d q
´1 ˝ FL`d : pqk´1, qkq ÞÑ pqk, qk`1q422
and the discrete Hamiltonian map FLd ” FL`d ˝ pFL´d q´1 : pqk, pkq ÞÑ pqk`1, pk`1q preserve ωLd and423
ωcan, respectively.424
4. Information Geometry425
4.1. Statistical structure on M426
On a differentiable manifold M endowed with a metric g and a torsion-free affine connection ∇,427
the compatibility of a metric g and a connection ∇ is encoded by the cubic form 3-tensor field C “ ∇g,428
i.e., the covariant derivative of g. In a local coordinate system with basis Bi ” BBxi, the metric tensor g429
is locally represented by430
gijpxq “ gpBi, Bjq, (33)
and the components of ∇ takes the contravariant form Γlij, where431
∇BiBj “ÿ
l
Γlij Bl , (34)
Version September 9, 2017 submitted to Entropy 16 of 31
or the covariant form Γij,k, where432
Γij,k “ gp∇BiBj, Bkq “ÿ
l
glkΓlij. (35)
Torsion-freeness of ∇ implies the symmetry of its (first) two lower indices, i.e.,433
Γlijpxq “ Γl
jipxq, Γij,lpxq “ Γji,lpxq.
We can now compute the cubic form,434
CpBi, Bj, Bkq “ p∇BkgqpBi, Bjq “ BkgpBi, Bjq ´ gp∇Bk
Bi, Bjq ´ gpBi,∇BkBjq,
or in components,435
Cijk “Bgij
Bxk´ Γki,j ´ Γkj,i. (36)
When the cubic form is identically zero, ∇ is said to be parallel with respect to g. A torsion-free436
connection parallel to g is called the Levi-Civita connection p∇ associated to the given metric g:437
BkgpBi, Bjq “ gp p∇BkBi, Bjq ` gpBi, p∇Bk
Bjq. (37)
The fundamental theorem of Riemannian geometry establishes the existence and uniqueness of the438
Levi-Civita connection, which is a solution of (37), and is given by,439
pΓkij “
ÿ
l
gkl
2
ˆ
Bgil
Bxj `Bgjl
Bxi ´Bgij
Bxl
˙
.
Generalizing the notion of parallelism of a connection is the notion of conjugacy (denoted by ˚)440
between two connections. A connection ∇˚ is said to be conjugate (or dual) to ∇ with respect to g if441
BkgpBi, Bjq “ gp∇BkBi, Bjq ` gpBi,∇˚Bk
Bjq. (38)
Clearly, p∇˚q˚ “ ∇. Moreover, p∇, which satisfies (37), is special in the sense that it is self-conjugate442
p p∇q˚ “ p∇. Writing out (38):443
Bgij
Bxk “ Γki,j ` Γ˚kj,i, (39)
where Γ˚kj,i is defined analogously to (34) and (35),444
∇˚BiBj “
ÿ
l
Γ˚lij Bl ,
so that445
Γ˚kj,i “ gp∇˚BjBk, Biq “
ÿ
l
gilΓ˚lkj .
From (36) and (38), the cubic form can now be written as446
Cijk “ Γ˚kj,i ´ Γkj,i.
Clearly, Cijk “ Cjik, i.e., it is symmetric with respective to its first two indices. When both ∇ and ∇˚447
are torsion-free, this implies that448
Γkj,i “ Γjk,i, Γ˚kj,i “ Γ˚jk,i,
Version September 9, 2017 submitted to Entropy 17 of 31
then Cijk “ Cikj, which leads to C being totally symmetric in all the indices,449
Cijk “ Cikj “ Ckij “ Ckji “ Cjki “ Cjik.
Requiring that Cijk is totally symmetric imposes a compatibility condition between g and ∇, so450
that they form a Codazzi pair (see Simon, 2000), which generalizes the Levi-Civita coupling (whose451
corresponding cubic form Cijk ” 0). Lauritzen (1987) defined a statistical manifold pM, g,∇q to be a452
manifold M equipped with metric g and connection ∇ such that i) ∇ is torsion-free; ii) ∇g ” C is453
totally symmetric. Equivalently, a manifold has a statistical structure when the conjugate (with respect454
to g) ∇˚ of a torsion-free connection ∇ is also torsion-free. In this case, ∇˚g “ ´C, and the Levi-Civita455
connection ∇ “ p∇`∇˚q2.456
On a statistical manifold, one can define a one-parameter family of affine connections ∇pαq, called457
α-connections (α P R) [2]:458
∇pαq “ 1` α
2∇` 1´ α
2∇˚. (40)
Obviously, ∇p0q “ p∇ is the Levi-Civita connection, and the cubic form is given by ∇pαqg “ αC.459
The curvature/flatness of a connection ∇ is described by the Riemann curvature tensor R, defined460
as461
RpBi, BjqBk “ p∇Bi∇Bj ´∇Bj∇BiqBk.
Writing RpBi, BjqBk “ř
l RlkijBl and substituting (34), the components of the Riemann curvature tensor462
are463
Rlkijpxq “
BΓljkpxq
Bxi ´BΓl
ikpxqBxj `
ÿ
mΓl
impxqΓmjkpxq ´
ÿ
mΓl
jmpxqΓmikpxq.
By definition, Rlkij is antisymmetric when i Ø j. The covariant form of the Riemann curvature is464
Rlkij “ÿ
mglm Rm
kij.
When the connection is torsion-free, Rlkij is antisymmetric when i Ø j or when k Ø l, and symmetric465
when pi, jq Ø pl, kq. It is related to the Ricci tensor Ric by Rickj “ř
i,l Rlkijgil .466
In addition, it can be shown that the curvatures Rlkij, R˚lkij for the pair of conjugate connections467
∇,∇˚ satisfy468
Rlkij “ R˚lkij.
A connection is said to be flat when Rlkijpxq ” 0. So, ∇ is flat if and only if ∇˚ is flat. In this case, the469
manifold is said to be dually-flat, and the metric g takes on a particular form (to be discussed later).470
4.2. Divergence function and induced geometry471
A divergence function D : MˆMÑ Rě0 on a manifold M with respect to a local chart V Ď Rn is472
a C3 function satisfying473
(i) Dpx, yq ě 0,@x, y P V, with equality holding if and only if x “ y;474
(ii) Dipx, xq “ D,jpx, xq “ 0,@i, j P t1, 2, ¨ ¨ ¨ , nu;475
(iii) ´Di,jpx, xq is positive-definite.476
Here Dipx, yq “ Bxi Dpx, yq, D,ipx, yq “ ByiDpx, yq denote partial derivatives with respect to the i-th477
component of point x and of point y, respectively, and Di,jpx, yq “ BxiByjDpx, yq the second-order478
mixed derivative, and so on.479
On a manifold, divergence functions act as pseudo-distance functions that are nonnegative but480
need not be symmetric. Every divergence function induces a dualistic Riemannian structure, i.e.,481
statistical structure, which was first demonstrated by S. Eguchi (see [8]).482
Version September 9, 2017 submitted to Entropy 18 of 31
Lemma 4.1. A divergence function induces a Riemannian metric g and a pair of torsion-free conjugate483
connections ∇,∇˚ given as484
gijpxq “ ´ Di,jpx, yqˇ
ˇ
x“y ,
Γij,kpxq “ ´ Dij,kpx, yqˇ
ˇ
ˇ
x“y,
Γ˚ij,kpxq “ ´ Dk,ijpx, yqˇ
ˇ
ˇ
x“y.
The Γij,k, Γ˚ij,k are torsion-free and are conjugate with respect to the induced metric gij. Hence, the485
divergence function D induces pM, g,∇,∇˚q, which is a statistical manifold (Lauritzen [12]).486
A popular divergence function is the Bregman divergence BΦpx, yq [6], which is associated to a487
strictly convex function Φ:488
BΦpx, yq “ Φpyq ´Φpxq ´ xy´ x, BΦpxqy, (41)
where BΦ “ rB1Φ, ¨ ¨ ¨ , BnΦs denotes the exterior derivative, and x¨, ¨yn denotes the canonical pairing of489
a vector x “ rx1, ¨ ¨ ¨ , xns P Rn and a covector u “ ru1, ¨ ¨ ¨ , uns P rRn (dual to Rn), i.e.,490
xx, uy “nÿ
i“1
xiui. (42)
Where there is no danger of confusion, the subscript n in x¨, ¨yn is often omitted. A basic fact in convex491
analysis is that the necessary and sufficient condition for a smooth function Φ to be strictly convex is492
BΦpx, yq ą 0, (43)
for all x ‰ y.493
Recall that when Φ is convex, its convex conjugate, rΦ : rV Ď rRn Ñ R, is defined through the494
Legendre transform:495
rΦpuq “ xpBΦq´1puq, uy ´ΦppBΦq´1puqq, (44)
with r
rΦ “ Φ and pBΦq “ pBrΦq´1. Since rΦ is also convex, by (43), we obtain the Fenchel inequality,496
Φpxq ` rΦpuq ´ xx, uy ě 0,
for any x P V, u P rV, with equality holding if and only if497
u “ pBΦqpxq “ pBrΦq´1pxq ÐÑ x “ pBrΦqpuq “ pBΦq´1puq, (45)
or, in component form,498
ui “BΦBxi ÐÑ xi “
BrΦBui
. (46)
Using conjugate variables, we can introduce the canonical divergence AΦ : V ˆ rV Ñ R` (and499
ArΦ : rV ˆV Ñ R`),500
AΦpx, vq “ Φpxq ` rΦpvq ´ xx, vy “ ArΦpv, xq.
They are related to the Bregman divergence (41) via the relation501
BΦpx, pBΦq´1pvqq “ AΦpx, vq “ BrΦppB
rΦqpxq, vq.
Version September 9, 2017 submitted to Entropy 19 of 31
Though the Bregman divergence is not a metric, it satisfies a quadrilateral relation [23]: For any502
four points x, x1, x2, x3 P V,503
BΦpx, x1q ` BΦpx3, x2q ´ BΦpx, x2q ´ BΦpx3, x1q “ xx2 ´ x1, BΦpxq ´ BΦpx3qy .
As a special case, when x3 “ x1, BΦpx3, x1q “ 0, the above equality reduces to the Pythagorean504
(generalized cosine) relation among three points x, x1, x2 P V:505
BΦpx, x1q ` BΦpx1, x2q ´ BΦpx, x2q “ xx2 ´ x1, BΦpxq ´ BΦpx1qy .
This is the Pythagorean relation [3] for a dually-flat space. Using this relation, one can state506
minimization problems for divergence functions.507
The quadrilateral relation can be expressed in terms of the canonical divergence A as follows,508
AΦpx, uq `AΦpy, vq ´AΦpx, vq ´AΦpy, uq “ xx´ y, v´ uy,
for any four points x, y P V, v, u P rV.509
Zhang [22] introduced the α-indexed family of Φ-divergence functions DpαqΦ on V ˆV,510
DpαqΦ px, yq “4
1´ α2
ˆ
1´ α
2Φpxq `
1` α
2Φpyq ´Φ
ˆ
1´ α
2x`
1` α
2y˙˙
. (47)
Furthermore, Dp˘1qΦ px, yq is defined by taking limαÑ˘1:511
Dp´1qΦ px, yq “ Dp1qΦ py, xq “ BΦpx, yq,
Dp1qΦ px, yq “ Dp´1qΦ py, xq “ BΦpy, xq.
Note that DpαqΦ px, yq satisfies the relation (called referential duality in [22,27]),512
DpαqΦ px, yq “ Dp´αqΦ py, xq,
that is, exchanging the two points in the directed distance amounts to α Ø ´α.513
4.3. Hessian manifolds and biorthogonal coordinates514
Applying Lemma 4.1 to the Bregman divergence BΦ induces the following metric,515
gijpxq “B2ΦpxqBxiBxj ,
and the pair of torsion-free conjugate connections,516
Γij,kpxq “ 0, Γ˚ij,kpxq “B3ΦpxqBxiBxjBxk .
In this case, M is dually-flat. This yields a Hessian manifold, where g takes the form of the Hessian of a517
strictly convex function Φ.518
Hessian manifolds enjoy a special status in information geometry, as they exhibit biorthogonal519
coordinates on M that are globally affine coordinates despite the nontrivial Riemannian (Hessian)520
metric on M.521
Consider the coordinate transform x ÞÑ u,522
Bi ”B
Bui“
ÿ
l
Bxl
Bui
B
Bxl “ÿ
l
FliBl ,
Version September 9, 2017 submitted to Entropy 20 of 31
where the Jacobian matrix F is given by523
Fijpxq “Bui
Bxj , Fijpuq “Bxi
Buj,
ÿ
l
Fil Fl j “ δij, (48)
where δij is the Kronecker delta, which takes the value 1 when i “ j and 0 otherwise. If the new524
coordinate system u “ ru1, ¨ ¨ ¨ , uns (with components denoted by subscripts) is such that525
Fijpxq “ gijpxq, (49)
then the x-coordinate system and the u-coordinate system are said to be biorthogonal to each other,526
since, from the definition of the metric tensor (33),527
gpBi, B jq “ gpBi,ÿ
l
Fl jBlq “ÿ
l
Fl jgpBi, Blq “ÿ
l
Fl jgil “ δji .
In this case, we define528
gijpuq “ gpBi, B jq, (50)
which is equal to Fij, the Jacobian of the inverse coordinate transform u ÞÑ x. We also introduce the529
contravariant representation of the affine connection ∇ with respect to the u-coordinate system, and530
denote it by an unconventional notation Γrst , which is defined by531
∇BrBs “ÿ
tΓrs
t Bt;
similarly, Γ˚rst is defined by532
∇˚BrBs “
ÿ
tΓ˚rs
t Bt.
The covariant representation of the affine connections will be denoted by superscripted Γ and Γ˚,533
Γij,kpuq “ gp∇BiBj, Bkq, Γ˚ij,kpuq “ gp∇˚
BiBj, Bkq. (51)
The representation of the affine connections in the u-coordinate system (denoted by superscripts) and534
the x-coordinate system (denoted by subscripts) are related by535
Γrst puq “
ÿ
k
¨
˝
ÿ
i,j
Bxr
Bui
Bxs
BujΓk
ijpxq `B2xk
BurBus
˛
‚
BukBxt , (52)
and536
Γrs,tpuq “ÿ
i,j,k
Bxr
Bui
Bxs
Buj
Bxt
BukΓij,kpxq `
B2xt
BurBus. (53)
Similarly relations hold between Γ˚rst puq and Γ˚k
ij pxq, and between Γ˚rs,tpuq and Γ˚ij,kpxq.537
Analogous to (39), we have the following identity,538
B2xt
BusBur“BgrtpuqBus
“ Γrs,tpuq ` Γ˚ts,rpuq.
Therefore, with respect to biorthogonal coordinates, a pair of conjugate connections ∇,∇˚ satisfy,539
Γ˚ts,rpuq “ ´ÿ
i,j,k
girpuqgjspuqgktpuqΓij,kpxq, (54)
Version September 9, 2017 submitted to Entropy 21 of 31
and540
Γ˚ tsr puq “ ´
ÿ
j
gjspuqΓtjrpxq. (55)
We now investigate conditions for the existence of biorthogonal coordinates on a Riemannian541
manifold pM, gq. From its definition (49), it can easily be shown that542
Proposition 4.2. A Riemannian manifold M with metric gij admits biorthogonal coordinates if and only ifBgij
Bxk543
is totally symmetric, i.e.,544
BgijpxqBxk “
BgikpxqBxj . (56)
In this case, M is Hessian, namely, its Riemannian metric gij can be written as the Hessian (second-derivative)545
of some convex function.546
That (56) is satisfied for biorthogonal coordinates is evident by virtue of (48) and (49). Conversely,547
given (56), there must be n functions uipxq, i “ 1, 2, ¨ ¨ ¨ , n, such that,548
BuipxqBxj “ gijpxq “ gjipxq “
BujpxqBxi .
The above identity implies that there exist a function Φ such that ui “ BiΦ and, by positive definiteness549
of gij, Φ would have to be a strictly convex function! In this case, the x- and u-variables satisfy (45),550
and the pair of convex functions, Φ and its conjugate rΦ, are related to gij and gij by551
gijpxq “B2ΦpxqBxiBxj ÐÑ gijpuq “
B2rΦpuq
BuiBuj.
It follows from Proposition 4.2 that a necessary and sufficient condition for a Riemannian manifold552
to admit biorthogonal coordinates it that its Levi-Civita connection is given by553
pΓij,kpxq ”12
ˆ
Bgik
Bxj `Bgjk
Bxi ´Bgij
Bxk
˙
“12Bgij
Bxk .
From this, the following can be shown:554
Corollary 4.3. A Riemannian manifold pM, gq admits a pair of biorthogonal coordinates x and u if and only if555
there exists a pair of conjugate connections ∇ and ∇˚ such that Γij,kpxq “ 0, Γ˚rs,tpuq “ 0.556
In other words, biorthogonal coordinates are affine coordinates for the dually-flat pair of557
connections. In fact, we can now define a pair of torsion-free connections by558
Γij,kpxq “ 0, Γ˚ij,kpxq “Bgij
Bxk ,
and show that they are conjugate with respect to g, that is, they satisfy (38). This means that we select559
an affine connection ∇ such that x is its affine coordinate system. From (53), when ∇˚ is expressed in560
u-coordinates,561
Γ˚rs,tpuq “ÿ
i,j,k
girpuqgjspuqBxk
But
BgijpxqBxk `
BgtspuqBur
“ÿ
i,j
girpuqˆ
´BgjspuqBut
gijpxq˙
`BgtspuqBur
Version September 9, 2017 submitted to Entropy 22 of 31
“ ´ÿ
j
δrjBgjspuqBut
`BgtspuqBur
“ 0.
This implies that u is an affine coordinate system with respect to ∇˚. Furthermore,562
gijpuq “B2
rΦpuqBuiBuj
, Γij,kpuq “B3
rΦpuqBuiBujBuk
,
where rΦ is the convex conjugate of Φ. Therefore, biorthogonal coordinates are affine coordinates for a563
pair of dually-flat connections. On the manifold of parameterized probability density functions, if the564
x-coordinates are the natural parameters, then the u-coordinates are the expectations.565
5. Linking Information Geometry with Geometric Mechanics566
5.1. Symplectic structure on QˆQ induced from the divergence function D567
We will now establish the connection between information geometry and discrete geometric568
mechanics. The divergence function from information geometry can be viewed as a Type I generating569
function of a symplectic map, and in particular, it can be viewed as a discrete Lagrangian in the sense570
of discrete Lagrangian mechanics. More specifically, let the configuration manifold be the information571
manifold, i.e., Q “ M, and the discrete Lagrangian be the divergence function, i.e., Ld “ D. With572
this identification, we observe that the information geometric construction of symplectic structure on573
MˆM described below is nothing but the discrete symplectic structure on QˆQ given in (28) where574
the discrete Lagrangian Ld is replaced with the divergence function D.575
From information geometry, a divergence function D is given as a scalar-valued binary function on576
Q (of dimension n). We now view it as a unary function on QˆQ (of dimension 2n) that vanishes along577
the diagonal ∆Q Ă QˆQ. In this subsection, we investigate the conditions under which a divergence578
function can serve as a generating function of a symplectic structure on QˆQ. A compatible metric579
on QˆQ will also be derived. When restricted to the diagonal submanifold ∆Q, the skew-symmetric580
symplectic form will vanish, so ∆Q, which carries a statistical structure, is actually a Lagrangian581
submanifold.582
First, we fix a point x in the first slot, and a point y in the second slot of QˆQ – this results in583
two n-dimensional submanifolds of QˆQ that will be denoted, Qx “ Qˆ tyu » Q (with the y point584
fixed) and Qy “ txu ˆQ » Q (with the x point fixed), respectively. The canonical symplectic form ωx585
on the cotangent bundle T˚Qx is given by586
ωx “ÿ
i
dxi ^ dξi.
Given D, we define a map LD from Qy Ă QˆQ to T˚Qx, px, yq ÞÑ px, ξq, which is given by,587
LD : px, yq ÞÑ px,ÿ
i
Dipx, yqdxiq.
Recall that the comma indicates whether the variable being differentiated with respect to is in the first588
or second slot. It is easily checked that there exists a neighborhood of the diagonal ∆Q Ă QˆQ, such589
that the map LD is a diffeomorphism. In particular, the Jacobian of the map is given by590
˜
δij Dij0 Di,j
¸
,
which is nondegenerate in a neighborhood of the diagonal ∆Q.591
Version September 9, 2017 submitted to Entropy 23 of 31
We calculate the pullback by LD of the canonical symplectic form ωx on T˚Qx to QˆQ:592
L˚D ωx “ L˚D pÿ
i
dxi ^ dξiq “ÿ
i
dxi ^ dDipx, yq
“ÿ
i
dxi ^ÿ
j
pDijpx, yqdxj `Di,jdyjq “ÿ
ij
Di,jpx, yqdxi ^ dyj.
Here,ř
ij Dijdxi ^ dxj “ 0, since by the equality of mixed partials, Dijpx, yq “ Djipx, yq always holds.593
Similarly, we consider the canonical symplectic form ωy “ř
i dyi ^ dηi on T˚Qy and define a594
map RD from QˆQ Ñ T˚Qy, px, yq ÞÑ py, ηq, which is given by595
RD : px, yq “ py,ÿ
i
D,ipx, yqdyiq.
Using RD to pullback ωy to QˆQ yields an analogous formula:596
R˚Dωy “ ´ÿ
ij
Di,jpx, yqdxi ^ dyj.
Therefore, based on canonical symplectic forms on T˚Qx and T˚Qy, we obtained the same597
symplectic form on QˆQ598
ωDpx, yq “ ´ÿ
ij
Di,jpx, yqdxi ^ dyj. (57)
Theorem 5.1. A divergence function D induces a symplectic form ωD (57) on QˆQ which is the pullback of599
the canonical symplectic forms ωx and ωy by the maps LD and RD ,600
L˚D ωy “ÿ
ij
Di,jpx, yqdxi ^ dyj “ ´R˚D ωx. (58)
With the symplectic form ωD given as above, it is easy to check that ωD is closed:601
dωD “ÿ
ijk
"
B3DBxkBxiByj dxk ^ dxi ^ dyj `
B3DBykBxiByj dyk ^ dxi ^ dyj
*
“ 0.
It was Barndorff-Nielsen and Jupp [5] who first proposed (57) as an induced symplectic form on602
QˆQ, apart from a minus sign; they called the divergence function D a york.603
The fact that this symplectic structure coincides with the one introduced in discrete mechanics604
should come as no surprise. The Qx and Qy submanifolds are related to the two ways of viewing605
Qˆ Q as a bundle over Q, depending on whether one chooses π1 : Qˆ Q Ñ Q, pq1, q2q ÞÑ q1 or606
π2 : QˆQ Ñ Q, pq1, q2q ÞÑ q2 as the bundle projection. Then, the maps LD , RD are, up to a sign, simply607
the discrete fiber derivatives FL˘d , where the discrete Lagrangian Ld is replaced by the divergence608
function D.609
5.2. Divergence as a Type I generating function610
As we have seen previously, symplectic maps are a natural way of describing the flow of611
Hamiltonian mechanics on the cotangent bundle T˚Q. We will now consider the characterization612
of symplectic maps in terms of generating functions, and in particular, we review three different613
parameterizations based on the classification given in Goldstein [10].614
Lemma 5.2. Given pq0, q1q P QˆQ, then pq0, p0q ÞÑ pq1, p1q on T˚Q is symplectic if and only if there exists615
S1 : QˆQ Ñ R such that616
p0 “ ´D1S1, p1 “ D2S1. (59)
Version September 9, 2017 submitted to Entropy 24 of 31
To prove this, observe that617
dS1pq0, q1q “ pD1S1qdq0 ` pD2S1qdq1,
from which, we immediately obtain618
´dp0 ^ dq0 ` dp1 ^ dq1 “ 0 “ d2S1pq0, q1q “ dpD1S1q ^ dq0 ` dpD2S1q ^ dq1.
Identifying the corresponding terms yield (59).619
Type I generating functions S1 are linked with other types of generating functions via partial620
Legendre transforms. Fixing the first or second variable slot leads to, respectively, Type II or III621
generating functions, denoted S2, S3 respectively.622
Let H` be a submanifold, with local coordinates pq0, p1q, of Qˆ pT˚Qq, with local coordinates623
pq0, pq1, p1qq, where q1 is dependent on q0 and p1. Then pq0, p0q ÞÑ pq1, p1q on T˚Q is symplectic if and624
only if there exists S2 : H` Ñ R such that625
p0 “ D1S2, q1 “ D2S2. (60)
Likewise, let H´ be a submanifold, whose local coordinates are pp1, q0q, of pT˚Qq ˆQ with local626
coordinates ppq0, p0q, q1q where q0 is dependent on p0 and q1. Then pq0, p0q ÞÑ pq1, p1q on T˚Q is627
symplectic if and only if there exists S3 : H´ Ñ R such that628
q0 “ ´D1S3, p1 “ ´D2S3. (61)
In the case of discrete mechanics, the Type II generating function is denoted by H`pq0, p1q and629
the Type III generating function is denoted by H´pq1, p0q. We compute their exterior derivatives:630
dH`pq0, p1q “ D1H`pq0, p1qdq0 `D2H`pq0, p1qdp1, (62)
dH´pq1, p0q “ D1H´pq1, p0qdq1 `D2H´pq1, p0qdp0. (63)
From this, we obtain,631
0 “ d2H`pq0, p1q “ dpD1H`pq0, p1qdq0 `D2H`pq0, p1qdp1q (64)
“ dpp0dq0 ` q1dp1q “ dp0 ^ dq0 ` dq1 ^ dp1, (65)
0 “ d2H´pq1, p0q “ dpD1H´pq1, p0qdq1 `D2H´pq1, p0qdp0q (66)
“ dpp1dq1 ` q0dp0q “ dp1 ^ dq1 ` dq0 ^ dp0. (67)
Therefore, symplectic maps can be defined implicitly in terms of a Type II generating function632
H`pq0, p1q,633
q1 “ D2H`pq0, p1q, p0 “ D1H`pq0, p1q,
and a Type III generating function H´pq1, p0q,634
q0 “ D2H´pq1, p0q, p1 “ D1H´pq1, p0q.
More explicitly, these are related to the discrete Lagrangian Ldpq0, q1q, which is a Type I generating635
function, by the following partial Legendre transforms:636
H`pq0, p1q “ extq1txp1, q1y ´ Ldpq0, q1qu , (68)
H´pq1, p0q “ extq0txp0, q0y ´ Ldpq0, q1qu , (69)
Version September 9, 2017 submitted to Entropy 25 of 31
or equivalently,637
H`pq0, p1q ´ xp1, q˚1 y ` Ldpq0, q˚1 q “ 0, (70)
H´pq1, p0q ´ xp0, q˚0 y ` Ldpq˚0 , q1q “ 0. (71)
The upshot of the above discussion is that p0, p1 are Legendre dual variables with respect to q0, q1,638
whereas in the fiberwise Legendre transform FL, it is pq0, v0q, pq1, v1qwhich are dual to pq0, p0q, pq1, p0q639
– the dual correspondence is q Ø p, instead of v Ø p. As before, the two discrete Legendre dualities640
are due to the two ways of viewing QˆQ as a bundle over Q.641
In the context of information geometry, H˘ is nothing but the partial Legendre transform of the642
divergence function Dpx, yq with respect to the first or second argument. Consider the Bregman643
divergence BΦpq0, q1q,644
BΦpq0, q1q ” Φpq1q ´Φpq0q ´ xBΦpq0q, q1 ´ q0y,
and view it as a discrete Lagrangian Ldpq0, q1q. Then, its partial Legendre transform with respect to q1,645
the Type II generating function H`, is646
H`pq0, p1q “
B
q1,BBΦpq0, q1q
Bq1
F
´ BΦpq0, q1q ,
which evaluates to647
H`pq0, p1q “ BΦpq1pq0, p1q, q0q,
where648
q1 “ pBΦq´1pp1 ` BΦpq0qq,
is obtained by solving649
p1 “BBΦpq0, q1q
Bq1“ BΦpq1q ´ BΦpq0q.
By substitution, we obtain,650
H`pq0, p1q “ BΦpBΦq´1pBΦpq0q ` p1q, q0q “ BrΦpBΦpq0q, BΦpq0q ` p1q.
Note that in this case, the Legendre dual of q1 is no longer p1 “ BΦpq1q as given by the fiberwise651
Legendre map, but is rather shifted by an amount BΦpq0q. It is interesting that H` still takes the form652
of B, as does Ld. This is a special property of taking the Bregman divergence as the generating function.653
5.3. D-divergence for decoupling L and H654
In geometric mechanics, Hamiltonian and Lagrangian dynamics represent one and the same655
dynamics – they are coupled; this is because Hpq, pq and Lpq, vq are related by the fiberwise Legendre656
transform FL “ pFHq´1 – in fact they are a Legendre pair. The conservation properties of the657
Hamiltonian approach with respect to the underlying symplectic geometry and the variational658
principles that arise in the Lagrangian and Hamilton–Jacobi theories reflect two sides of the same coin.659
TQ FL // T˚Q
QˆQ
Rq
ee
FLd
99
(72)
To appreciate this, we look at the interaction of three manifolds QˆQ, T˚Q and TQ. We take660
pqk, qk`1q to be the configuration variable q at successive time-step – it is the dynamical equation that661
governs the evolution from qk to qk`1. The Hamiltonian dynamics, which is encoded in the preservation662
Version September 9, 2017 submitted to Entropy 26 of 31
of ωcan of T˚Q, governs discrete Hamiltonian flow Hpqk, pkq, through a Type I generating function663
Ldpqk, qk`1q. On the other hand, the Lagrangian flow, which is governed by the retraction map Rq, such664
as the Dirichlet-to-Neumann map induced by Jacobi’s solution Sq0pqq to the Hamilton–Jacobi equation.665
Those two dynamic updates qk Ñ qk`1 need not be identical. In mechanics, the Hamiltonian energy666
conservation system and the Lagrangian extremization system leads to one and the same dynamics,667
precisely because Lpqk, vkq and Hpqk, pkq are linked through the fiberwise Legendre transform FL at qk:668
Lpqk, vkq ` Hpqk, pkq ´ xvk, pkyqk ” 0.
In other words, L and H are perfectly coupled – with no duality gap.669
Information geometry, on the other hand, starts with a divergence (or contrast) function Dpq, v, pq670
on TQ‘ T˚Q, which measures the discrepancy between the two systems. Given Hpq, pq on T˚Q and671
Lpq, vq on TQ, we write672
Dpq, v, pq “ Lpq, vq ` Hpq, pq ´ xv, pyq.
Theorem 5.3. Let Lpq, vq and Hpq, pq be strictly convex functions, defined on TQ and T˚Q in terms of the673
variables pq, vq and pq, pq, respectively. Then, for the following statements, any two imply the rest:674
(i) D “ 0;675
(ii) Hpq, ¨q and Lpq, ¨q are (fiberwise) convex conjugate (Legendre dual) to each other;676
(iii) p “ BLBv “ FLpvq ;677
(iv) v “ BHBp “ FHppq .678
When D “ 0, ωLpξL, ¨q “ ιξL ωL “ dEL with ξL “ p 9q, 9vq, and679
ELpq, vq “ÿ
i
vi BLpq, vqBvi ´ Lpq, vq.
Then,680
dEL “ÿ
i
¨
˝
ÿ
j
B2LBqiBvj vj ´
BLBqi
˛
‚dqi `ÿ
ij
vj B2L
BviBvj dvi.
The Euler–Lagrange equations are equivalent to681
ξL “
ˆ
BELBq
,BELBv
˙
.
Our insight here is that D does not have to vanish identically. The consequence is that we do not682
require the Lagrangian dynamics (extremization dynamics) and Hamiltonian dynamics (conservation683
dynamics) to be coupled; they will be allowed to evolve independently. The function D allows us to684
study fiberwise symplectomorphisms of Dirac manifolds.685
Let us consider the case that (ii) holds, i.e., Hpq, ¨q and Lpq, ¨q are Legendre duals to each other.686
Then, the canonical divergence D can be written as the Bregman divergence BH and BL, after applying687
the fiberwise Legendre map FL or FH,688
p “BLpq, vqBv
ðñ v “BHpq, pqBp
.
This implies that,689
BLpq, v0, v1q ” Dˆ
q, v0,BLpq, vqBv1
˙
“ Lpq, v1q ´ Lpq, v0q ´
B
BLpq, v0q
Bv0, v1 ´ v0
F
,
BHpq, p0, p1q ” Dˆ
q,BHpq, p0q
Bp0, p1
˙
“ Hpq, p1q ´ Hpq, p0q ´
B
BHpq, p0q
Bp0, p1 ´ p0
F
,
Version September 9, 2017 submitted to Entropy 27 of 31
and they satisfy,690
BLpq, v0, v1q “ BHpq, p1, p0q.
This is the reference-representation biduality [23,27], which is satisfied whenever L and H are Legendre691
duals of each other.692
5.4. Variational error analysis693
Recall that we previously defined the exact discrete Lagrangian LEd (16), which is related to Jacobi’s694
solution of the Hamilton–Jacobi equation. The significance of the exact discrete Lagrangian is that it695
generates the exact discrete time flow of a Lagrangian system, but in general it cannot be computed696
explicitly. Instead, a computable discrete Lagrangian Ld is used instead to construct a discretization of697
Lagrangian mechanics, and it induces the discrete Lagrangian map FLd .698
Since discrete variational mechanics is expressed in terms of discrete Lagrangians, and the699
exact discrete Lagrangian generates the exact flow map of a continuous Lagrangian system, it is700
natural to ask whether we can characterize the order of accuracy of the Lagrangian map FLd as701
an approximation of the exact flow map, in terms of the extent to which the discrete Lagrangian702
Ld approximates the exact discrete Lagrangian LEd . This is indeed possible, and is referred to as703
variational error analysis. Theorem 2.3.1 of [15] shows that if a discrete Lagrangian Ld approximates the704
exact discrete Lagrangian LEd to order p, i.e., Ldpq0, q1; hq “ LE
d pq0, q1; hq `Ophp`1q, then the discrete705
Hamiltonian map, FLd : pqk, pkq ÞÑ pqk`1, pk`1q, viewed as a one-step method, is order p accurate.706
As mentioned above, the divergence function D from information geometry can serve as a Type I707
generating function of a symplectic map, and hence it can be viewed as a discrete Lagrangian in the708
sense of discrete Lagrangian mechanics. A divergence function also generates the Riemannian metric709
and affine connection structures on the diagonal manifold (Lemma 4.1), in addition to generating the710
symplectic structure on MˆM. Viewed in this way, a natural question is to what extent can we view711
the divergence function as corresponding to the exact Lagrangian flow of an associated continuous712
Lagrangian. We can show that713
Theorem 5.4. The exact discrete Lagrangian LEd pqp0q, qphq, hq associated with the geodesic flow, with respect714
to the induced metric g, can be approximated by a divergence function Dpqp0q, qphqq up to third order Oph3q715
accuracy,716
LEd pqp0q, qphq, hq “ hLpqp0q, vp0qq `Dpqp0q, qphqq `Oph3q,
if and only if M is a Hessian manifold, i.e., D is the Bregman divergence BΦ, for some strictly convex function717
Φ.718
Proof. Let us expand the exact discrete Lagrangian to obtain,719
LEd pqp0q, qphq, hq “ hLpq, 9qq `
h2
2
ˆ
BLBqpq, 9qq `
BLB 9qpq, 9qq ¨ :q
˙
`Oph3q
“ hLpq, vq `h2
2
ˆ
BLBqi vi `
BLBvi ai
˙
`Oph3q, (73)
where qp0q “ q, v “ 9qp0q, a “ :qp0q.720
From the definition of a divergence function:721
D,jpq, qq “ 0.
Differentiating with respect to q,722
0 “B
Bqi D,jpq, qq “ Di,jpq, qq `D,ijpq, qq,
Version September 9, 2017 submitted to Entropy 28 of 31
so723
Di,jpq, qq “ ´D,ijpq, qq.
Differentiating with respect to q again,724
B
Bqk D,ijpq, qq “ Dk,ijpq, qq `D,ijkpq, qq.
Observe that the left-hand side is the metric induced by the divergence function,725
B
Bqk D,ijpq, qq “ ´Di,jpq, qq “ gijpqq.
Expanding Dpq, q1q around q “ qp0q for q1 “ qphq:726
q1 “ q` vh`12
ah2 `Oph3q,
we obtain727
Dpq, q1q “h2
2gijpqq `
h3
4gijpqqpviaj ` vjaiq `
h3
6Γijkpqqvivjvk `Oph4q,
where728
Dpq, qq “ 0, D,ipq, qq “ 0, D,ijpq, qq ” gijpqq “ ´Di,jpq, qq, D,ijkpq, qq ” Γijkpqq.
Clearly, gij “ gji, and729
Γijk “ Γikj “ Γkij “ Γkji “ Γjki “ Γjik.
Comparing the corresponding terms in powers of h, we obtain,730
Lpq, vq “12
gijvivj, (74)
vi BLBxi pq, vq ` ai BL
Bvi pq, vq “12
gijpaivj ` ajviq `13
Γijkvivjvk. (75)
Substituting (74) into (75) yields731
Bgij
Bqk “ Γijk,
where,732
Bgij
Bqk “Bgik
Bqj .
This, according to Proposition 4.2, demonstrates that the manifold M is Hessian, and hence dually-flat.733
So, for the expansions to agree to Oph3q, the inducing divergence function D must be the Bregman734
divergence BΦ.735
6. Summary736
In this paper, we show the differences and connections between geometric mechanics and737
information geometry in canonically prescribing differential geometric structures on a smooth manifold738
Q. The Legendre transform plays crucial roles in both; however, they serve very different purposes.739
In geometric mechanics, the fiberwise Legendre map serves to link the cotangent bundle T˚Q with740
tangent bundle TQ, whereas in information geometry, the Legendre transform relates the pair of741
biorthogonal coordinates, which are special coordinates on a dually-flat manifold Q. More specifically,742
FL (or its inverse FH) is invoked to establish the isomorphism between T˚Q Ø TQ in geometric743
mechanics, whereas in information geometry, a Hessian metric g built upon a convex function on Q is744
Version September 9, 2017 submitted to Entropy 29 of 31
used for the correspondence between two coordinate systems on Q, and also for potentially (but not745
necessarily) establishing a correspondence between TQ and T˚Q.746
The link between information geometry and discrete mechanics is much stronger when one747
considers the discrete version (as opposed to the traditional, continuous version) of geometric748
mechanics. Both endow a symplectic structure ωˆ on QˆQ, through the use of a discrete Lagrangian749
Ld in the case of geometric mechanics and a divergence function D in the case of information geometry750
– in fact they are both Type I generating functions for inducing ωˆ on Q ˆ Q via pullback from751
the canonical symplectic structure ωcan on T˚Q. Using the Legendre transform, Type II generating752
functions can be constructed, which lead to the (right) discrete Hamiltonian H` in geometric mechanics753
and to the dual divergence function in information geometry.754
Our analyses draw a distinction between the fiberwise Legendre map (which is used in continuous755
mechanics setting), the Legendre transform between biorthogonal coordinates (which is used in756
information geometry), and the Legendre transform between Type I and Type II generating functions757
(which is used in the setting of both discrete geometric mechanics and information geometry). The758
distinctions are more prominent when one considers the Pontryagin bundle TQ‘ T˚Q. There, we759
can construct a divergence function that actually measures the duality gap between the Lagrangian760
function and the Hamiltonian function that generate a pair of (forward and backward) Legendre maps.761
In so doing, we demonstrate that information geometry can be viewed as an extension of geometric762
mechanics based on Dirac mechanics and geometry, with a full-blown duality between the Lagrangian763
and Hamiltonian components.764
7. Discussion and Future Directions765
Noda [16] showed that, with respect to the symplectic structure ωˆ on QˆQ, the Hamiltonian766
flow of the canonical divergence A induces geodesic flows for ∇ and ∇˚. He interpreted biorthogonal767
coordinates as a single coordinate system on QˆQ, in a way that is consistent with treating A as the768
Type I generating function on QˆQ. It remains unclear how the resulting Hamiltonian flow is related769
to dynamical flow on the Dirac manifold.770
In another related work, Ay and Amari [4] sought to characterize the canonical form of divergence771
functions for general (non dually-flat) manifolds. They investigated the retraction map Rq : tqu ˆQ Ñ772
TqQ which we discussed in Section 2.5, and used the exponential map associated to any torsion-free773
affine connection ∇ on TQ. This approach, based on parallel transport, in essence generates a semispray774
on TQ, and is quite different from characterizing the dynamics using the Hamilton flow on T˚Q. Note775
that even though one may define a symplectic structure (through pullback) on TQ as well, [4] treats776
the semispray on TQ as the primary geometric object. Future research will clarify its relation to our777
approach, which is based on defining a symplectic structure on QˆQ directly.778
Finally, comparing information geometry with geometric mechanics may shed light on universal779
machine learning algorithms. In machine learning or state estimation applications, we wish to have the780
estimated distribution be influenced by the observations, so that the estimated distribution eventually781
becomes consistent with the observed data. Let txiu denote the sequence of predictions by (possibly782
a series of) model distributions X , and let tyiu denote the actual data generated by an unknown783
distribution Y that we are trying to estimate. In practice, the divergence functions are constructed784
so that the pseudo-distance between two distributions X and Y can be computed using only complete785
information about X and samples from Y . As such, we can measure the mismatch between the current786
prediction xi and the actual data yi using Dpxi, yiq, since the asymmetry in the definition of D is such787
that we only require samples yi from the true but unknown distribution. So, adding a momentum term788
to ensure gentle change in model predictions, a possible choice of a discrete Lagrangian for generating789
the discrete dynamics for the machine learning application might be given by790
Lpxi, xi`1q “ Dpxi, xi`1q `Dpxi`1, yi`1q,
Version September 9, 2017 submitted to Entropy 30 of 31
where the first term can be interpreted as the action associated with the kinetic energy, and the second791
term is the action associated with the potential energy. By construction, the term Dpxi, yiq vanishes792
when the prediction xi is consistent with the actual observation yi, and it is positive otherwise, so the793
term Dpxi, yiq can be viewed as a potential energy term that penalizes mismatch between the estimated794
distribution and the observational data. Our variational error analysis may thus shed light on an795
asymptotic theory of inference where sample size N Ñ8 is akin to discretization step h Ñ 0.796
The link between geometric mechanics and information geometry, as revealed through our797
present investigation, is still rather preliminary. The possibility of a unified mathematical framework798
for information and mechanics is intriguing and remains a challenge for future research.799
Acknowledgements800
We thank the anonymous reviewers for helping to improve this paper. The first author is801
supported by NSF grants CMMI-1334759 and DMS-1411792. The second author is supported by802
DARPA/ARO Grant W911NF-16-1-0383.803
804
1. Abraham, R.; Marsden, J. Foundations of mechanics; Benjamin/Cummings Publishing Co. Inc. Advanced805
Book Program: Reading, Mass., 1978. Second edition, revised and enlarged, With the assistance of Tudor806
Ratiu and Richard Cushman.807
2. Amari, S. Differential-geometrical methods in statistics; Vol. 28, Lect. Notes Stat., Springer-Verlag, New York,808
1985; pp. v+290.809
3. Amari, S.; Nagaoka, H. Methods of information geometry; Vol. 191, Translations of Mathematical Monographs,810
American Mathematical Society, Providence, RI; Oxford University Press, Oxford, 2000; pp. x+206.811
Translated from the 1993 Japanese original by Daishi Harada.812
4. Ay, N.; Amari, S. A novel approach to canonical divergences within information geometry. Entropy 2015,813
17, 8111–8129.814
5. Barndorff-Nielsen, O.E.; Jupp, P.E. Yorks and symplectic structures. Journal of Statistical Planning and815
Inference 1997, 63, 133–146.816
6. Bregman, L.M. The relaxation method of finding the common point of convex sets and its application817
to the solution of problems in convex programming. USSR Computational Mathematics and Physics 1967,818
7, 200–217.819
7. Eguchi, S. Second order efficiency of minimum contrast estimators in a curved exponential family. Ann.820
Stat. 1983, 11, 793–803.821
8. Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647.822
9. Fei, T.; Zhang, J. Interaction of Codazzi couplings with (para)-Kähler geometry. Results Math. 2017. In823
press.824
10. Goldstein, H. Classical mechanics, second ed.; Addison-Wesley Publishing Co., Reading, Mass., 1980; pp.825
xiv+672. Addison-Wesley Series in Physics.826
11. Lall, S.; West, M. Discrete variational Hamiltonian mechanics. J. Phys. A 2006, 39, 5509–5519.827
12. Lauritzen, S. Statistical manifolds. In Differential Geometry in Statistical Inference; Amari, S.;828
Barndorff-Nielsen, O.; Kass, R.; Lauritzen, S.; Rao, C., Eds.; IMS: Hayward, CA, 1987; Vol. 10, IMS829
Lect. Notes, pp. 163–216.830
13. Leok, M.; Ohsawa, T. Variational and Geometric Structures of Discrete Dirac Mechanics. Found. Comput.831
Math. 2011, 11, 529–562.832
14. Marsden, J.E.; Ratiu, T.S. Introduction to mechanics and symmetry, second ed.; Vol. 17, Texts in Applied833
Mathematics, Springer-Verlag, New York, 1999; pp. xviii+582. A basic exposition of classical mechanical834
systems.835
15. Marsden, J.; West, M. Discrete mechanics and variational integrators. Acta Numer. 2001, 10, 317–514.836
16. Noda, T. Symplectic structures on statistical manifolds. J. Aust. Math. Soc. 2011, 90, 371–384.837
17. Rockafellar, R.T. Convex analysis; Princeton Mathematical Series, No. 28, Princeton University Press,838
Princeton, N.J., 1970; pp. xviii+451.839
Version September 9, 2017 submitted to Entropy 31 of 31
18. Shima, H. The geometry of Hessian structures; World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ,840
2007; pp. xiv+246.841
19. Tao, J.; Zhang, J. Transformations and coupling relations for affine connections. Differ. Geom. Appl. 2016,842
49, 111–130.843
20. Yoshimura, H.; Marsden, J. Dirac structures in Lagrangian mechanics Part II: Variational structures. J.844
Geom. Phys. 2006, 57, 209–250.845
21. Yoshimura, H.; Marsden, J. Dirac structures in Lagrangian mechanics Part I: Implicit Lagrangian systems.846
J. Geom. Phys. 2006, 57, 133–156.847
22. Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195.848
23. Zhang, J. Dual scaling of comparison and reference stimuli in multi-dimensional psychological space. J.849
Math. Psych. 2004, 48, 409–424.850
24. Zhang, J.; Matsuzoe, H. Dualistic Riemannian Manifold Structure Induced from Convex Functions. In851
Advances in Applied Mathematics and Global Optimization: In Honor of Gilbert Strang; Gao, D.; Sherali, H., Eds.;852
Springer: Boston, MA, 2009; pp. 437–464.853
25. Zhang, J.; Li, F. Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence854
Functions. In Geometric Science of Information, Paris, France, August 28-30, 2013. Proceedings; Nielsen, F.;855
Barbaresco, F., Eds.; Springer: Berlin, Heidelberg, 2013.856
26. Zhang, J. Divergence Functions and Geometric Structures They Induce on a Manifold. In Geometric Theory857
of Information; Nielsen, F., Ed.; Springer, 2014; pp. 1–30.858
27. Zhang, J. Reference duality and representation duality in information geometry. AIP Conference Proceedings859
2015, 1641, 130–146.860
c© 2017 by the authors. Submitted to Entropy for possible open access publication under the terms and conditions861
of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).862