semantic scholar › a5e5 › ac9771d2e3c6...version september 9, 2017 submitted to entropy 3 of 31...

Article

The Connection between Information Geometry andGeometric Mechanics

Melvin Leok 1,: ID and Jun Zhang 2,:*1 University of California, San Diego; [email protected] University of Michigan, Ann Arbor; [email protected]* Correspondence: [email protected]; Tel.: +1-734-763-6161: These authors contributed equally to this work.

Academic Editor: nameVersion September 9, 2017 submitted to Entropy

Abstract: The divergence function in information geometry, and the discrete Lagrangian in discrete1

geometric mechanics each induce a differential geometric structure on the product manifold QˆQ.2

We aim to investigate the relationship between these two objects, and the fundamental role that3

duality, in the form of Legendre transforms, play in both fields. By establishing an analogy between4

these two approaches, we will show how a fruitful cross-fertilization of techniques arise. In particular,5

we establish, through variational error analysis that the divergence function agrees with exact discrete6

Lagrangian up to third order if and only if Q is a Hessian manifold.7

Keywords: Lagrangian, Hamiltonian, Legendre map, symplectic form, divergence function,8

generating function, Hessian manifold9

1. Introduction10

Information geometry and geometric mechanics each induce geometric structures on an arbitrary11

manifold Q, and we investigate the relationship between these two approaches. More specifically, we12

study the interaction of three objects: TQ, the tangent bundle on which a Lagrangian function Lpq, vq13

is defined; T˚Q, the cotangent bundle on which a Hamiltonian function Hpq, vq is defined; and QˆQ,14

the product manifold on which the divergence function (from the information geometric perspective)15

or the Type I generating function (from the geometric mechanics perspective) is defined. In discrete16

mechanics, while the QˆQ ÐÑ T˚Q correspondence is via a symplectomorphism given by the time-h17

flow map associated with the Hamiltonian Hpq, pq, and the QˆQ ÐÑ TQ correspondence is via the18

map relating the boundary-value and initial-value formulation of the Euler–Lagrange flow, it is the19

correspondence between TQ ÐÑ T˚Q through the fiberwise Legendre map based on L or H that20

actually serves to couple the Hamiltonian flow with Lagrangian flow, leading to one and the same21

dynamics. We propose a decoupling of the Lagrangian and Hamiltonian dynamics through the use22

of a divergence function Dpq, v, pq defined on the Pontryagin bundle TQ‘ T˚Q that measures the23

discrepancy (or duality gap) between Hpq, pq and Lpq, vq. We also establish, through variational error24

analysis that the divergence function agrees with exact discrete Lagrangian up to third order if and25

only if Q is a Hessian manifold.26

Geometric mechanics investigates the equations of motion using the Lagrangian, Hamiltonian,27

and Hamilton–Jacobi formulations of classical Newtonian mechanics. Two apparently different28

principles were used in those formulations: the principle of conservation (energy, momentum, etc)29

leading to Hamiltonian dynamics and the principle of variation (least action) leading to Lagrangian30

dynamics. The conservation properties of the Hamiltonian approach are with respect to the underlying31

symplectic geometry on the cotangent bundle, whereas the variational principles that result in the32

Submitted to Entropy, pages 1 – 31 www.mdpi.com/journal/entropy

http://www.mdpi.com

https://orcid.org/0000-0002-8326-0830

http://www.mdpi.com/journal/entropy

Version September 9, 2017 submitted to Entropy 2 of 31

Euler–Lagrange equation of motions and the Hamilton–Jacobi equations reflect the geometry of the33

semispray on the tangent bundle. Lagrangian and Hamiltonian mechanics reflect two sides of the same34

coin – that they describe the identical dynamics on the configuration space (base manifold) is both35

remarkable and also to be expected due to their construction: because the Lagrangian and Hamiltonian36

are dual to each other, and related via the Legendre transform.37

Information geometry [2,3] in the broadest sense of the term, provides a dualistic Riemannian38

geometric structure that is induced by a class of functions called divergence functions, which essentially39

provide a method of smoothly measuring a directed distance between any two points on the manifold,40

where the manifold is the space of probability densities. It arises in various branches of information41

science, including statistical inference, machine learning, neural computation, information theory,42

optimization and control, etc. Various geometric structures can be induced from divergence functions,43

including metric, affine connection, symplectic structure, etc., and this is reviewed in [26]. Convex44

duality and the Legendre transform play a key role in both constructing the divergence function and45

characterizing the various dualities encoded by information geometry [22].46

Given that geometric mechanics and information geometry both prescribe dualistic geometric47

structures on a manifold, it is interesting to explore the extent to which these two frameworks are48

related. In geometric mechanics, the Legendre transform provides a link between the Hamiltonian49

function that is defined on the cotangent bundle T˚Q, with the Lagrangian function that is defined on50

the tangent bundle TQ, whereas in information geometry, it provides a link between the biorthogonal51

coordinates of the base manifold Q if it is dually-flat and exhibits the Hessian geometry. To understand52

their deep relationship, it turns out that we need to resort to the discrete formulation of geometric53

mechanics, and investigate the product manifold Q ˆ Q. The basic tenet of discrete geometric54

mechanics is to preserve the fact that Hamiltonian flow is a symplectomorphism, and construct55

discrete time maps that are symplectic. This results in two ways of viewing discrete-time mechanics,56

either as maps on QˆQ or T˚Q, which are related via discrete Legendre transforms. The shift in focus57

from T˚Q to QˆQ precisely lends itself to establishing a connection to information geometry, as the58

divergence function is naturally defined on QˆQ, and in both information geometry and discrete59

geometric mechanics, induces a symplectic structure on Qˆ Q. This is the basic observation that60

connects geometric mechanics and information geometry, and we will explore the implications of this61

connection in the paper.62

Our paper is organized as follows. Section 2 provides a contemporary viewpoint of geometric63

mechanics, with Lagrangian and Hamiltonian systems discussed in parallel with one another, in64

terms of geometry on TQ and T˚Q, respectively, including a discussion of Dirac mechanics on the65

Pontryagin bundle TQ‘ T˚Q, which provides a unified treatment of Lagrangian and Hamiltonian66

mechanics. Section 3 summarizes the results of the discrete formulation of geometric mechanics, which67

is naturally defined on the product manifold QˆQ. Section 4 is a review of now-classical information68

geometry, including the Riemannian metric and affine connections on TQ, and the manner in which the69

divergence function naturally induces dualistic Riemannian structures. The special cases of Hessian70

geometry and biorthogonal coordinates are highlighted, showing how the Legendre transform is71

essential for characterizing dually-flat spaces. Section 5 starts with a presentation of the symplectic72

structure on QˆQ induced by a divergence function, which is naturally identified with the Type I73

generating function on it. We follow up by investigating its transformation into a Type II generating74

function (which plays a key role in discrete Hamiltonian mechanics). We then propose to decouple75

the discrete Hamiltonian and Lagrangian dynamics by using the divergence function to measure76

their duality gap. Finally, we perform variational error analysis to show that on a dually-flat Hessian77

manifold, the Bregman divergence is third-order accurate with respect to the exact discrete Lagrangian.78

Section 6 closes with a summary and discussion.79


2. A Review of Geometric Mechanics80

Consider an n-dimensional configuration manifold Q, with local coordinates pq1, . . . , qnq. The81

Lagrangian formulation of mechanics is defined on the tangent bundle TQ, in terms of a Lagrangian82

L : TQ Ñ R. From this, one can construct an action integral S which is a functional of the curve83

q : rt1, t2s Ñ Q, given by84

Spqq “ż t2

t1

Lpqptq, 9qptqq dt.

Then, Hamilton’s variational principle states that,85

δSpqq “ 0, (1)

where the variation δS is induced by an infinitesimal variation δq of the trajectory q, subject to the86

condition that the variations vanish at the endpoints, i.e., δqpt1q “ δqpt2q “ 0. Applying standard87

results from the calculus of variations, we obtain the following Euler–Lagrange equations of motion,88

ddtBLB 9qi ´

BLBqi “ 0. (2)

The Hamiltonian formulation of mechanics is defined on the cotangent bundle T˚Q, and the89

fiberwise Legendre transform, FL : TQ Ñ T˚Q, relates the tangent bundle and cotangent bundle as90

follows,91

pqi, 9qiq ÞÑ pqi, piq “

ˆ

qi,BLB 9qi

˙

,

where pi is the conjugate momentum to qi:92

p “BLB 9q

. (3)

The term fiberwise is used to emphasize the fact that FL establishes a pointwise correspondence between93

TqQ and T˚q Q for any point q on Q. The cotangent bundle forms the phase space, on which one can94

define a Hamiltonian H : T˚Q Ñ R,95

Hpq, pq “ xp, 9qpq, pqy ´ Lpq, 9qpq, pqq,

where 9q is viewed as a function of pq, pq by inverting the Legendre transform (3), and96

xp, vyq “ÿ

i

pivi

denotes the duality or natural pairing between a vector v and covector p at the point q P Q. A97

straightforward calculation shows that98

BHBpi

“ 9qi , (4)

and99BHBqi “ ´

BLBqi .

From these, we transform the Euler–Lagrange equations into Hamilton’s equations,100

dqi

dt“BHBpi

,dpidt“ ´

BHBqi . (5)


The canonical symplectic form ωcan on T˚M can be identified with a quadratic form induced by the101

skew-symmetric matrix J, i.e., ωcanpv, wq “ vT Jw. With that identification, Hamilton’s equations can102

be expressed as,103«

9q9p

ff

“

«

0 I´I 0

ff «

BHBqBHBp

ff

“ J

«

BHBqBHBp

ff

.

Alternatively, Hamilton’s equations (5) can be derived using Hamilton’s phase space variational104

principle, which states that,105

δ

ż t2

t1

rxp, 9qy ´ Hpq, pqs dt “ 0,

for infinitesimal variations δq that vanish at the endpoints. The infinitesimal variation of the integral106

is computed by differentiating under the integral, integrating by parts, and using the fact that the107

infinitesimal variations δq vanish at the endpoints, which yields:108

ż t2

t1

ˆ

xδp, 9qy ` xp, δ 9qy ´BHBq

δq´BHBp

δp˙

dt “ż t2

t1

„ˆ

´ 9p´BHBq

˙

δq`ˆ

9q´BHBp

˙

δp

dt,

and by the fundamental theorem of the calculus of variations, which states that the integral is stationary109

only when the terms in the parentheses multiplying into the independent variations δq and δp vanish,110

we recover Hamilton’s equations (5).111

Lagrangian and Hamiltonian mechanics are typically viewed as different representations of the112

same dynamical system, with the Legendre transform relating the two formulations. Here, the Legendre113

transform FL (with FH as its inverse) refers to both the map relating two sets of variables, pq, 9qq with114

pq, pq, as well as the relationship between two functions, the Lagrangian Lpq, 9qq and the Hamiltonian115

Hpq, pq. The Legendre transform links pairs of convex conjugate functions; in classical mechanics,116

the Lagrangian L and Hamiltonian H are always related in this sense of forming a convex pair. The117

requirement that Lpq, 9qq be strictly convex in the variable 9q is referred to as hyperregularity. When118

the Lagrangian is positive homogeneous (or singular), the Legendre transform yields a Hamiltonian119

function that is identically zero, which means that in such cases, the Hamiltonian analogue of the120

Lagrangian system does not exist, which is problematic in the context of analytic mechanics. In order121

to address such degeneracy, it is necessary to consider Dirac mechanics on Dirac manifolds, which is a122

simultaneous generalization of Lagrangian and Hamiltonian mechanics.123

In geometric mechanics, including the contemporaneous Dirac formulation, the Lagrangian L124

and Hamiltonian H are always coupled via the fiberwise Legendre transform FL. In information125

geometry, it is a well-known fact that one can construct the divergence function (to be defined later),126

which captures the departure from such perfect coupling. In other words, we can view Lagrangian and127

Hamiltonian systems as two separate systems, which are endowed with their own dynamics and are in128

some sense dual to each other, and we then use the divergence function to measure their duality gap.129

For this reason, we will review the Lagrangian and Hamiltonian formulation of mechanics in terms130

of Lpq, 9qq and Hpq, pq, respectively, without necessarily assuming that the Lagrangian and Hamiltonian are131

related by the Legendre transform.132

2.1. Lagrangian mechanics as an extremization system on TQ133

As noted previously, the Euler–Lagrange equations (2) arise from the stationarity conditions that134

describe the extremal curves of the action integral, over the class of varied curves that fix the endpoints.135

Carrying out the differentiation in (2) explicitly yields,136

ÿ

j

B2LBviBvj

dvj

dt`ÿ

j

vj B2L

BqjBvi ´BLBqi “ 0. (6)


The fundamental tensor gij associated with the Lagrangian Lpq, vq is given by,137

gijpq, vq “B2Lpq, vqBviBvj ,

which is assumed to be positive-definite, i.e., the Lagrangian L is hyperregular. Let gil denote the138

matrix inverse of gij, then (6) can be written as139

d2qi

dt2 ` 2Giˆ

q,dqdt

˙

“ 0, (7)

where140

Gipq, vq “ÿ

l

gil

2

˜

ÿ

k

B2LBqkBvl vk ´

BLBql

¸

.

So, (7) with the above Gi are Euler–Lagrange equations in disguise, and its solution is an extremal141

curve of the action integral.142

Recall that a smooth curve on Q can be lifted to a curve on TQ in a natural way: a curve143

t ÞÑ qptq P Q becomes t ÞÑ pqptq, 9qptqq P TQ. Given an arbitrary Gipq, vq, the system of equations (7)144

specify a family of curves, called a semispray. As seen above, semisprays arise naturally in variational145

calculus as extremal curves of the action integral associated with a Lagrangian.146

Semisprays can be more generally described by a vector field. Recall that a vector field on Q is147

a section of TQ. Now, consider a vector field on the tangent bundle TQ; it is a section of the double148

tangent bundle TpTQq. The integral surfaces of the semispray induces a decomposition of the total149

space TpTQq into the horizontal subspace HpTQq and the vertical subspace VpTQq, which defines an150

Ehresmann connection. A vector on TQ encodes the second-order derivative of curves on Q, and a151

semispray defines the following vector field V on TQ:152

Vpq, vq “ÿ

i

˜

vi B

Bqi

ˇ

ˇ

ˇ

ˇ

pq,vq´ 2Gipq, vq

B

Bvi

ˇ

ˇ

ˇ

ˇ

pq,vq

¸

,

where the factor ´2 is there by convention. The integral curve of the semispray satisfies the153

second-order ODE (7), and we say that a semispray is a vector field on the tangent bundle TQ154

which encodes a second-order system of differential equations on the base manifold Q.155

A semispray is called a full spray if the spray coefficients Gi satisfy156

Gipq, λvq “ λ2Gipq, vq,

for λ ą 0. In this case, the integral curve remains invariant under reparameterization by a positive157

number, i.e., it satisfies homogeneity. When the semispray becomes a (full) spray, the Lagrange158

geometry becomes Finsler geometry, and the fundamental tensor gijpq, vq becomes the Finsler–Riemann159

metric tensor (which includes the Riemann metric as a special case).160

As noted above, a semispray induces an Ehresmann connection on Q and this connection is161

torsion-free and typically nonlinear. Conversely, given a torsion-free connection, one can construct a162

semispray. The connection is homogenous if and only if the semispray is a full spray. Moreover, if the163

spray is affine, then the connection is affine as well – an affine spray Gi takes the form164

Gi “12

ÿ

jk

Γijkpqq vj vk,

where Γijk is referred to as the affine connection.165


To summarize, Lagrangian dynamics is related to action minimization by the Euler operator, and166

leads to a semispray on the configuration manifold Q. Under suitable conditions, the Lagrangian167

function Lpq, vq defined on TQ will lead to a torsion-free but generally nonlinear connection, and an168

affine connection only for a very special form of Lagrangian.169

2.2. Hamiltonian as a conservative system on T˚Q170

Given a Hamiltonian H : T˚Q Ñ R, we consider the Hamiltonian vector field XH P XpT˚Qq defined171

by172

XH “

ˆ

BHBpi

,´BHBqi

˙

. (8)

It is straightforward to verify that H “ const along the dynamical flow of XH :173

dHdt“ÿ

i

ˆ

BHBqi

dqi

dt`BHBpi

dpidt

˙

“ 0.

So, a Hamiltonian vector field XH advects the Hamiltonian H along its flow, so that H is constant174

along solution curves, which implies that the Lie derivative L of H along the flow of XH vanishes,175

LXH H “ 0.

Formally, starting from the tautological 1-form θ “ř

i pi dqi on Q, one obtains a 2-form ω, called176

the Poincaré 2-form,177

ω “ ´dθ “ÿ

i

dqi ^ dpi,

which is the canonical symplectic form on T˚Q:178

ωpX, Yq “ÿ

i

pdqi ^ dpiqpX, Yq “ÿ

i

ˆ

BXBqi

BYBpi

´BXBpi

BYBqi

˙

,

where X, Y are vector fields on T˚Q.179

More generally, given a Hamiltonian H along with a symplectic form ω, which is, by definition, a180

closed, nondegenerate 2-form, one obtains the Hamiltonian vector field XH on T˚Q, defined in abstract181

notation by182

ιXH ω “ dH, (9)

or equivalently in a more familiar notation,183

ωpXH , ¨q “ dHp¨q.

One can define the Poisson bracket t¨, ¨u of two functions F and G by using their respective184

Hamiltonian vector fields and the symplectic form,185

tF, Gu ” ωpXF, XGq.

For the canonical symplectic form, it has the following coordinate expression,186

tF, Gu “ÿ

i

ˆ

BFBqi

BGBpi

´BFBpi

BGBqi

˙

.

In this way, Hamilton’s equations can be expressed in terms of the Poisson bracket as follows,187

dqi

dt“ tqi, Hu,

dpi

dt“ tpi, Hu. (10)


By Darboux’s theorem, it is always possible to choose local coordinates pq, pq on T˚Q, referred to188

as canonical coordinates, such that the symplectic form has the expression ω “ř

i dqi ^ dpi. In these189

coordinates, Hamilton’s equations defined by (9) and (10) take the form given in (8).190

Note that any smooth function H on T˚Q induces a Hamiltonian vector field. An arbitrary vector191

field X on T˚Q is locally Hamiltonian, i.e., induced by a smooth function H, if ιXω is closed, i.e.,192

LXω “ 0. In addition, a Hamiltonian vector field preserves the volume form Ω, i.e.,193

LXpΩq “ 0,

where Ω is the n-fold exterior product of ω,194

Ω “p´1q

npn´1q2

n!ω^ω ¨ ¨ ¨ ^ω.

2.3. Symplectic maps and symplectic flows195

A symplectic map is a diffeomorphism of T˚Q that preserves its symplectic structure ω. We first196

consider a one-parameter family of symplectic maps generated by the flow map FX : T˚Q Ñ T˚Q of197

a vector field X P XpT˚Qq (where X denotes a section). Since the entire family of symplectic maps198

leave ω invariant, it follows that £Xω “ 0. It can be shown (using Cartan’s magic formula, and the fact199

that ω is closed) that a vector field X P XpT˚Qq is symplectic if iXω is closed, i.e., dpiXωq “ 0. By the200

Poincaré lemma, this implies that iXω is locally exact, that is, in the neighborhood of any point, there201

exists some function H : T˚Q Ñ R such that iXω “ dH. So there is always locally exists a Hamiltonian202

H : T˚Q Ñ R that generates a vector field X whose flow is symplectic with respect to ω.203

More generally, a diffeomorphism φ : M1 Ñ M2 is a symplectic map from a symplectic space204

pM1, ω1q to another space pM2, ω2q if:205

φ˚ω2 “ ω1, (11)

where ω1, ω2 are the symplectic forms on M1, M2, respectively. The above condition (11) holds if and206

only if for any functions f , g:207

(i) t f , gu ˝ φ “ t f ˝ φ, g ˝ φu,208

(ii) φ˚XH “ XH˝φ.209

With respect to Darboux coordinates about a point z P M1, the condition (11) that a map φ : M1 Ñ M2210

is symplectic can be expressed locally by pDzφqT JpDzφq “ J, where DzF denotes the Jacobian of φ at z.211

A canonical transformation of T˚M is an automorphism φ : T˚Q Ñ T˚Q,212

φ :

#

pq, pq Ñ pq1qipq1, ¨ ¨ ¨ , qn, p1, ¨ ¨ ¨ , pnq,

pq, pq Ñ pp1qipq1, ¨ ¨ ¨ , qn, p1, ¨ ¨ ¨ , pnq,

such that213

ω “ÿ

i

dqi ^ dpi “ÿ

i

dq1i ^ dp1i.

The significance of canonical transformations is that they preserve the form of Hamilton’s equations,214

and one can check that an automorphism φ is canonical by verifying that pDzφqT JpDzφq “ J in a215

Darboux coordinate chart.216

2.4. Symplectic structure on TQ pulled back from T˚Q217

If we endow T˚Q with the canonical symplectic form, we can construct a symplectic form on TQ218

in such a way that these two spaces are symplectomorphic.219


The mapping between TQ and T˚Q can be constructed in two different ways, Case I involves the220

Legendre transform:221

TQ ÐÑ T˚Qpq, vq ÞÑ

´

q, BLpq,vqBv

¯

´

q, BHpq,pqBp

¯

Ð[ pq, pq

(12)

and Case II involves the Riemannian metric tensor g (on Q):222

TQ ÐÑ T˚Qpq, vq ÞÑ

´

q,ř

j gijvj¯

´

q,ř

j gij pj

¯

Ð [ pq, pq

(13)

Note that we say that g is a Riemannian metric on Q when g acts on a pair of tangent vectors at the223

tangent space TqQ at a point q of Q; it can be viewed as a p0, 2q-tensor that maps TQˆ TQ Ñ R. On224

the other hand, the symplectic form ω is a skew-symmetric p0, 2q-tensor that acts on a pair of tangent225

vectors on T˚Q, so it maps TpT˚Qq ˆ TpT˚Qq Ñ R.226

Case I. Given the Lagrangian L : TQ Ñ R, this induces the fiberwise Legendre transform227

FL : TQ Ñ T˚Q, which is given by pq, vq ÞÑ pq, pq “´

q, BLBv

¯

. If L is hyperregular, then this map is a228

diffeomorphism. If we endow TQ with the pullback symplectic form pFLq˚ω, which is given by229

ÿ

ij

B2LBvjBvi dqi ^ dvj `

ÿ

ij

B2LBqiBvj dqi ^ dqj,

then the Legendre transform is a symplectomorphism (by construction).230

Case II. The Riemannian metric g induces the musical isomorphisms 5 : TQ Ñ T˚Q and 7 : T˚Q Ñ231

TQ between TQ and T˚Q, which are the operations that lower and raise the index, respectively. If we232

endow TQ with the pullback symplectic form 5˚ω, which is given by233

ω “ dq^ dppq, vq “ÿ

i,j

gijdqi ^ dvj `ÿ

ijk

Bgij

Bqk vi dqj ^ dqk,

then the musical isomorphism is a symplectomorphism (by construction).234

Link between Case I and Case II. It is possible that the two ways of identifying TQ Ø T˚Q may235

be the same; this happens when g on TQ coincides with the second derivatives of Lpq, vq with respect236

to the v-variable:237

gijpq, vq “B2Lpq, vqBviBvj

,

assuming L is hyperregular. The inverse of g, denoted g, can be obtained from238

gijpq, pq “B2Hpq, pqBpiBpj

,

using the Hamiltonian Hpq, pq defined on T˚Q. Note that when the Lagrangian has the form Lpq, vq “239

12 vT Mpqqv ´ Vpqq, this corresponds to the Riemannian metric g being given by the kinetic energy240

metric Mpqq.241


2.5. Hamilton-Jacobi theory and Dirichlet-to-Neumann map242

In classical mechanics, the Hamilton–Jacobi equation is first introduced as a partial differential243

equation that the action integral satisfies. Recall that the action integral S along the solution of the244

Euler–Lagrange equation (2) over the time interval rt0, ts is245

Sq0pqptqq :“ż t

t0

Lpqpsq, 9qpsqqds, where qp¨q is the solution of the Euler–Lagrange equation. (14)

This is referred to as Jacobi’s solution of the Hamilton–Jacobi equation. Here, we assume that the initial246

position qp0q is fixed and the final position qptq depends on the initial velocity v0 “ 9qp0q. By taking a247

variation δqptq of the endpoint qptq, one obtains a partial differential equation satisfied by Spq, tq:248

Hˆ

q,BSBq

˙

“ 0. (15)

This is the Hamilton–Jacobi equation, when H does not explicit depend on t.249

Conversely, it is shown that if Sq0pqq is a solution of the Hamilton–Jacobi equation then Sq0pqq is250

a generating function for the family of canonical transformations (or symplectic flows) that describe251

the dynamics defined by Hamilton’s equations. This result is the theoretical basis for the powerful252

technique of exact integration called separation of variables.253

There are two uses of Sq0pqq. First, it serves to characterize the Dirichlet-to-Neumann map, which254

refers to the correspondence between the boundary data pq0, qq P QˆQ with the initial data pq0, v0q P255

TQ of a dynamical system. Second, it provides a foliation of the configuration space Q, around the256

point q0 and parameterized by t, that is defined by the condition Sq0pqq “ const.257

In the rest of the paper, we will view Sq0pqq as a scalar-valued function of pq0, qptqq, which we258

refer to as the exact discrete Lagrangian LEd : QˆQ Ñ R,259

LEd pq0, q1; hq “ ext

qPC2pr0,hs,Qqqp0q“q0,qphq“q1

ż h

0Lpqptq, 9qptqqdt. (16)

This is equivalent to the expression for Jacobi’s solution, as the stationarity conditions of this variational260

characterization are simply the Euler–Lagrange equations. Furthermore, this characterization has the261

added benefit that it is well-defined even if the Lagrangian is degenerate. The exact discrete Lagrangian262

provides us with the time-h flow map for the Euler–Lagrange equation. Given a fixed initial point q0,263

this defines a map which takes q P Q to an initial velocity v0, such that the Euler–Lagrange trajectory264

qptqwith initial condition pqp0q, 9qp0qq “ pq0, v0q has boundary values pqp0q, qphqq “ pq0, qq. This is the265

Dirichlet-to-Neumann map QˆQ Ø TQ, pq0, qq ÞÑ pq0, v0q.266

To address the Dirichlet-to-Neumann map more generally, let us first recall the definition of a267

retraction:268

Definition 2.1 ([1], Definition 4.1.1 on p. 55). A retraction on a manifold Q is a smooth mapping R : TQ Ñ269

Q with the following properties: Let Rq : TqQ Ñ Q be the restriction of R to TqQ for an arbitrary q P Q; then,270

(i) Rqp0qq “ q, where 0q denotes the zero element of TqQ;271

(ii) with the identification T0qpTqQq » TqQ, Rq satisfies272

T0qRq “ idTqQ, (17)

where T0qRq is the tangent map of Rq at 0q P TqQ.273


Eq. (17) implies that the map Rq : TqQ Ñ Q is invertible in some neighborhood of 0q in TqQ. Its274

inverse is conveniently denoted as R : TQ Ñ QˆQ, which is defined by275

Rpvqq :“ pq,Rqpvqqq. (18)

It is easy to see that R : TQ Ñ QˆQ is also invertible in some neighborhood of 0q P TQ for any q P Q.276

Let us introduce a special class of coordinate charts that are compatible with a given retraction277

map R : TQ Ñ Q. A coordinate chart pU, ϕq with U an open subset in Q and ϕ : U Ñ Rn is said to be278

retraction compatible at q P U if279

(i) ϕ is centered at q, i.e., ϕpqq “ 0;280

(ii) the compatibility condition281

Rpvqq “ ϕ´1 ˝ Tq ϕpvqq (19)

holds, where we identify T0Rn with Rn as follows: Let ϕ “ pq1, . . . , qnq with qi : U Ñ R for282

i “ 1, . . . , n. Then283

vi B

Bqi ÞÑ pv1, . . . , vnq, (20)

where BBqi is the unit vector in the qi-direction in T0Rn.284

An atlas for the manifold Q is retraction compatible if it consists of retraction compatible coordinate285

charts.286

In Eq. (19), we assumed that Tq ϕpvqq P ϕpUq Ă Rn and so strictly speaking Rq is defined on287

pTq ϕq´1pϕpUqq Ă TqQ. However, it is always possible to define a coordinate chart such that ϕpUq “ Rn288

by stretching out the open set ϕpUq to Rn so that (19) is defined for any vq P TqQ.289

Retraction maps provide general means to relate Q ˆ Q to TQ: in essence it provides a290

correspondence between tqu ˆQ and TqQ for all q P Q (we may take q P Q to mean the projection of291

QˆQ onto either the first or the second slot).292

2.6. Variational mechanics and the Pontryagin bundle293

Lagrangian and Hamiltonian mechanics can be combined into Dirac mechanics, which is described294

on the Pontryagin bundle TQ‘ T˚Q, which has position, velocity, and momentum as local coordinates.295

Just as the Euler–Lagrange equations of motion arises out of Hamilton’s principle, Hamilton’s296

equations can also arise from Hamilton’s phase space principle:297

δ

ż t2

t1

rxp, 9qy ´ Hpq, pqs dt “ 0.

On the Pontryagin bundle TQ‘ T˚Q, which has local coordinates pq, v, pq, a relaxation of Hamilton’s298

principle (1) is the Hamilton–Pontryagin variational principle, which uses a Lagrange multiplier p to299

impose the second-order condition v “ 9q,300

δ

ż t2

t1

`

Lpq, vq ´ xp, 9q´ vyq˘

dt “ 0. (21)

This encapsulates both Hamilton’s and Hamilton’s phase space variational principles, as well as the301

Legendre transform, and gives the implicit Euler–Lagrange equations,302

9q “ v, 9p “BLBq

, p “BLBv

.

The last equation explicitly imposes the primary constraint condition, which is important when303

describing degenerate Lagrangian systems, such as electrical circuits. Note that the p are interpreted as304

Lagrange multipliers (Tulczyjew and Urbanski, 1999) in addition to its usual interpretation as conjugate305


momenta. The three equations can be combined by eliminating v and p to recover the Euler–Lagrange306

equations.307

An important application of Hamilton–Jacobi theory is in optimal control theory. Consider a308

typical optimal control problem,309

minup¨q

ż T

0Lpq, uq dt,

subject to the constraints,310

9q “ f pq, uq,

and the boundary conditions qp0q “ q0 and qpTq “ qT . We convert constrained optimization to311

unconstrained optimization by using Lagrange multipliers p (sometimes called the costate or auxiliary312

variables), and we can define the augmented cost functional:313

Srus :“ż T

0tLpq, uq ` xp, 9q´ f pq, uqyu dt “

ż T

0

xp, 9qy ´ Hpq, p, uq(

dt,

where we introduced the costate variables p, and also defined the control Hamiltonian,314

Hpq, p, uq :“ xp, f pq, uqy ´ Lpq, uq.

The variables pq, pq forms a Hamiltonian system, so we impose the optimality condition,315

BHBupq, p, uq “ 0,

to obtain the equation for the optimal control u “ u˚pq, pq, and we obtain the Hamiltonian,316

Hpq, pq :“ maxu

Hpq, p, uq “ Hpq, p, u˚pq, pqq.

We also define the optimal cost-to-go function,317

Cpq, tq :“ż T

t

!

Lpq, u˚q ` xp, 9q´ f pq, u˚qy)

ds

“

ż T

t

!

xp, 9qy ´ Hpq, pq)

ds “ S˚ ´ Spq, tq,

where pqpsq, ppsqq for s P r0, Ts is the solution of Hamilton’s equations with the above H such that318

qptq “ q; and S˚ is the optimal cost319

S˚ :“ż T

0

!

xp, 9qy ´ Hpq, pq)

ds “ż T

0

!

xp, 9qy ´ Hpq, p, u˚pq, pqq)

ds “ Sru˚s,

and the function Spq, tq is defined by320

Spq, tq :“ż t

0

!

xp, 9qy ´ Hpq, pq)

ds.

Since this definition coincides with (14), the function Spq, tq “ S˚´Cpq, tq satisfies the Hamilton–Jacobi321

equation (15); this reduces to the Hamilton–Jacobi–Bellman (HJB) equation for the optimal cost-to-go322

function Cpq, tq:323

BCBt`min

u

"B

BCBq

, f pq, uqF

` Lpq, uq*

“ 0. (22)

It can also be shown that the costate p of the optimal solution is related to the solution of the324

Hamilton–Jacobi–Bellman equation.325


3. Discrete Formulation of Geometric Mechanics326

In this section, we review various schemes for discretizing mechanics (see, e.g., [15]). Geometric327

mechanics focuses on the differential geometric structure of the configuration manifolds, the associated328

symplectic and Poisson structures on the phase space, and the conservation laws generated by329

symmetries, and geometric structure-preserving numerical integration aims to preserve as many330

of these geometric properties as possible under discretization. The main idea is to start from the331

canonical symplectic form ω on T˚Q, and look at the symplectomorphisms that preserve ω or its332

pullback via the Legendre transforms to QˆQ or TQ.333

3.1. Symplectomorphisms from T˚Q to TQ and to QˆQ334

Given a cotangent bundle T˚Q with a symplectic form ω, we wish to endow the bundles TQ and335

QˆQ with a symplectic structure. Given a function L : TQ Ñ R, the Legendre transform is viewed as336

the fiber derivative FL : TQ Ñ T˚Q, pq, 9qq ÞÑ pq, BLB 9qq. The pullback of ω with respect to FL yields a337

symplectic structure ωL “ FL˚ω on TQ.338

Similarly, given a function Ld : QˆQ Ñ R, we define two discrete fiber derivatives, FL˘d : QˆQ Ñ339

T˚Q, which serve as discrete Legendre transforms:340

FL`d pqk, qk`1q “ pqk`1, D2Ldpqk, qk`1qq, (23)

FL´d pqk, qk`1q “ pqk,´D1Ldpqk, qk`1qq. (24)

Here D1, D2 refers to taking a derivative with respect to the first or second slot, respectively:341

D1Ldpqk, qk`1q “BLdpqk, qk`1q

Bqk, D2Ldpqk, qk`1q “

BLdpqk, qk`1q

Bqk`1.

The two choices of discrete fiber derivatives correspond to whether one views QˆQ as a bundle over342

Q with respect to π´ : pqk, qk`1q ÞÑ qk or π` : pqk, qk`1q ÞÑ qk`1, i.e., projection onto the first or the343

second slot. These induce symplectic structures ω˘Ld“ pFL˘d q

˚ω on QˆQ by pullback.344

pTQ, ωLq //

FL

pTQ, ωLq

FL

pT˚Q, ωq

F //

FL´d

~~

FL`d

pT˚Q, ωq

FL´d

~~

FL`d

pQˆQ, ω´Ld

q99

pQˆQ, ω`Ldq

99pQˆQ, ω´Ld

q pQˆQ, ω`Ldq

345

Let F : T˚Q Ñ T˚Q be a symplectic map and let the maps denoted by the dotted arrows in the346

figure above be defined by requiring that the diagram commutes. Then, these maps are also symplectic347

maps, and the fiber derivative FL is a symplectomorphism between pTQ, ωLq and pT˚Q, ωq, and the348

discrete fiber derivatives FL˘d are symplectomorphisms between pQˆQ, ω˘Ldq and pT˚Q, ωq.349


3.2. Discrete Lagrangian mechanics350

The aim of geometric structure-preserving numerical integration is to preserve as many geometric351

conservation laws as possible under discretization. Discrete variational mechanics [15] is based on the352

discrete Hamilton’s principle,353

δÿn´1

i“0Ldpqi, qi`1q “ 0, (25)

where the endpoints q0 and qn are fixed, and the discrete Lagrangian, Ld : Q ˆ Q Ñ R, is a Type I354

generating function of the symplectic map. Recall that there exists an exact discrete Lagrangian (16) LEd ,355

that generates the exact time-h flow of a Lagrangian system, but it cannot be computed in general. One356

possible method of constructing computable discrete Lagrangians is the Galerkin approach, which357

involves replacing the infinite-dimensional function space C2pr0, hs, Qq and the integral in (16) with a358

finite-dimensional function space and a quadrature formula, respectively. Below are two examples of359

discrete Lagrangians:360

(i) Symplectic midpoint integrator361

Ldpq0, q1, hq “ hLˆ

q0 ` q1

2,

q1 ´ q0

2

˙

.

This can be obtained from the Galerkin approach by considering the family of linear polynomials362

as the finite-dimensional function space, and the midpoint rule as the quadrature formula.363

(ii) Störmer–Verlet integrator364

Ldpq0, q1, hq “h2

„

Lˆ

q0,q1 ´ q0q

2

˙

` Lˆ

q1,q1 ´ q0q

2

˙

.

This can be obtained from the Galerkin approach by considering the family of linear polynomials365

as the finite-dimensional function space, and the trapezoidal rule as the quadrature formula.366

Performing variational calculus on the discrete Hamilton’s principle (25) yields the discrete367

Euler–Lagrange (DEL) equations,368

D2Ldpqk´1, qkq `D1Ldpqk, qk`1q “ 0. (26)

The above equation implicitly defines the discrete Lagrangian map FLd : pqk´1, qkq ÞÑ pqk, qk`1q at points369

sufficiently close to the diagonal of QˆQ. This is equivalent to the implicit discrete Euler–Lagrange370

(IDEL) equations,371

pk “ ´D1Ldpqk, qk`1q, pk`1 “ D2Ldpqk, qk`1q, (27)

which is precisely the characterization of a symplectic map in terms of Type I generating function. It372

implicitly defines the discrete Hamiltonian map FLd : pqk, pkq ÞÑ pqk`1, pk`1q, and it is symplectic with373

respect to the canonical symplectic form ωcan on T˚Q, i.e., F˚Ldωcan “ ωcan.374

The two discrete fiber derivatives FL˘d induce a single unique discrete symplectic form ωLd “ ω˘Ld“375

pFL˘d q˚ωcan on QˆQ,376

ωLdpqk, qk`1q “B2Ld

BqkBqk`1pqk, qk`1qdqk ^ dqk`1, (28)

and the discrete Lagrangian map is symplectic with respect to ωLd on QˆQ, i.e., F˚LdωLd “ ωLd .377

The discrete Lagrangian and Hamiltonian maps can be expressed in terms of the discrete fiber378

derivatives, FLd “ FL`d ˝ pFL´d q´1, and FLd “ pFL´d q

´1 ˝FL`d , respectively. This characterization of the379

discrete flow maps underlies the proof of the variational error analysis theorem.380


These properties may be summarized in the following commutative diagram,381

pTQ, ωLq //

FL

pTQ, ωLq

FL

pT˚Q, ωcanq

FLd //

FL´d

zz

FL`d

$$

pT˚Q, ωcanq

FL´d

zz

FL`d

$$pQˆQ, ωLdq FLd

// pQˆQ, ωLdq FLd

// pQˆQ, ωLdq

If the exact discrete Lagrangian LEd is used, then the discrete Hamiltonian map FLd is equal to382

the time-h flow map of Hamilton’s equations, and the dotted arrow is the time-h flow map of the383

Euler–Lagrange equations.384

The variational integrator approach to constructing symplectic integrators simplifies the numerical385

analysis of these methods. In particular, the task of establishing the geometric conservation properties386

and order of accuracy of the discrete Lagrangian map FLd reduces to the simpler task of verifying387

certain properties of the discrete Lagrangian instead.388

3.3. Discrete Hamilton–Jacobi formulation389

In the context of discrete variational mechanics, discrete Hamilton–Jacobi theory can be viewed390

as a composition theorem which relates the composition of symplectic maps generated by a Type II391

generating function H`d pqk, pk`1qwith a symplectic map generated by a Type I generating function392

Skdpq0, qkq. By convention, the first argument q0 in the composition generating function is typically393

omitted, and we simply consider it to be a function Skdpqkq of the final position qk.394

The right discrete Hamiltonian, H`d pqk, pk`1q, is related to the discrete Lagrangian by the Legendre395

transform,396

H`d pqk, pk`1q “ xpk`1, qk`1y ´ Ldpqk, qk`1q,

where we impose the condition that pk`1 “ D2Ldpqk, qk`1q. Equivalently, this can be characterized397

variationally by H`d pqk, pk`1q “ extpk rxpk`1, qk`1y ´ Ldpqk, qk`1qs. This leads to a discrete Hamilton’s398

principle in phase space,399

δn´1ÿ

i“0

xpi`1, qi`1y ´ H`d pqi, pi`1q(

“ 0,

which yields the right discrete Hamilton’s equations,400

qk`1 “ D2H`d pqk, pk`1q, pk “ D1H`d pqk, pk`1q, (29)

which is precisely the characterization of a symplectic map in terms of Type II generating function.401

The continuous Hamilton–Jacobi equation can be derived by considering the evolution properties402

of Jacobi’s solution, which is the action integral evaluated along the solution of the Euler–Lagrange403

equations. One can derive a discrete Hamilton–Jacobi theory by considering a discrete analogue of404

Jacobi’s solution, expressed in terms of the right discrete Hamiltonian,405

Skdpqkq ”

k´1ÿ

i“0

pi`1 ¨ qi`1 ´ H`d pqi, pi`1q(

,

which we evaluate along a solution of the right discrete Hamilton’s equations (29). From this, we have,406

Sk`1d pqk`1q ´ Sk

dpqkq “ pk`1 ¨ qk`1 ´ H`d pqk, pk`1q, (30)


where pk`1 is considered to be a function of qk and qk`1. Taking derivatives with respect to qk`1, we407

obtain,408

DSk`1d pqk`1q “ pk`1 `

Bpk`1Bqk`1

¨“

qk`1 ´D2H`d pqk, pk`1q‰

,

but the term inside the parenthesis vanishes as we are restricting this to a solution of the right discrete409

Hamilton’s equations. Therefore, we have that410

DSk`1d pqk`1q “ pk`1,

which when substituted into (30) yields the discrete Hamilton–Jacobi equation,411

Sk`1d pqk`1q ´ Sk

dpqkq “ DSk`1d pqk`1q ¨ qk`1 ´ H`d pqk, DSk`1

d pqk`1qq.

3.4. Discrete Hamilton-Pontryagin principle412

Leok and Ohsawa [13] considered the discrete Hamilton’s principle and relaxed the discrete413

second-order condition,414

q1k “ q0

k`1,

and reimposed it using Lagrange multipliers pk`1, in order to derive the discrete Hamilton–Pontryagin415

principle on pQˆQq ˆQ T˚Q,416

δ”

ÿn´1

i“0Ldpq0

i , q1i q `

ÿn´2

i“0pi`1pq0

i`1 ´ q1i qı

“ 0. (31)

Here, the superscripts 0, or 1 on qi refers to the first or second slot, respectively, in QˆQ. This in turn417

yields the implicit discrete Euler–Lagrange equations,418

q1k “ q0

k`1, pk`1 “ D2Ldpq0k , q1

kq, pk “ ´D1Ldpq0k , q1

kq, (32)

where D1, D2 denote as before the partial derivative with respect to the first or second argument in419

Ld. Making the identification qk “ q0k “ q1

k´1, the last two equations define the discrete fiber derivatives,420

FL˘d : QˆQ Ñ T˚Q as given by (23) and (24). Discrete fiber derivatives induce a discrete symplectic form,421

ωLd ” pFL˘d q˚ωcan, and the discrete Lagrangian map FLd ” pFL´d q

´1 ˝ FL`d : pqk´1, qkq ÞÑ pqk, qk`1q422

and the discrete Hamiltonian map FLd ” FL`d ˝ pFL´d q´1 : pqk, pkq ÞÑ pqk`1, pk`1q preserve ωLd and423

ωcan, respectively.424

4. Information Geometry425

4.1. Statistical structure on M426

On a differentiable manifold M endowed with a metric g and a torsion-free affine connection ∇,427

the compatibility of a metric g and a connection ∇ is encoded by the cubic form 3-tensor field C “ ∇g,428

i.e., the covariant derivative of g. In a local coordinate system with basis Bi ” BBxi, the metric tensor g429

is locally represented by430

gijpxq “ gpBi, Bjq, (33)

and the components of ∇ takes the contravariant form Γlij, where431

∇BiBj “ÿ

l

Γlij Bl , (34)


or the covariant form Γij,k, where432

Γij,k “ gp∇BiBj, Bkq “ÿ

l

glkΓlij. (35)

Torsion-freeness of ∇ implies the symmetry of its (first) two lower indices, i.e.,433

Γlijpxq “ Γl

jipxq, Γij,lpxq “ Γji,lpxq.

We can now compute the cubic form,434

CpBi, Bj, Bkq “ p∇BkgqpBi, Bjq “ BkgpBi, Bjq ´ gp∇Bk

Bi, Bjq ´ gpBi,∇BkBjq,

or in components,435

Cijk “Bgij

Bxk´ Γki,j ´ Γkj,i. (36)

When the cubic form is identically zero, ∇ is said to be parallel with respect to g. A torsion-free436

connection parallel to g is called the Levi-Civita connection p∇ associated to the given metric g:437

BkgpBi, Bjq “ gp p∇BkBi, Bjq ` gpBi, p∇Bk

Bjq. (37)

The fundamental theorem of Riemannian geometry establishes the existence and uniqueness of the438

Levi-Civita connection, which is a solution of (37), and is given by,439

pΓkij “

ÿ

l

gkl

2

ˆ

Bgil

Bxj `Bgjl

Bxi ´Bgij

Bxl

˙

.

Generalizing the notion of parallelism of a connection is the notion of conjugacy (denoted by ˚)440

between two connections. A connection ∇˚ is said to be conjugate (or dual) to ∇ with respect to g if441

BkgpBi, Bjq “ gp∇BkBi, Bjq ` gpBi,∇˚Bk

Bjq. (38)

Clearly, p∇˚q˚ “ ∇. Moreover, p∇, which satisfies (37), is special in the sense that it is self-conjugate442

p p∇q˚ “ p∇. Writing out (38):443

Bgij

Bxk “ Γki,j ` Γ˚kj,i, (39)

where Γ˚kj,i is defined analogously to (34) and (35),444

∇˚BiBj “

ÿ

l

Γ˚lij Bl ,

so that445

Γ˚kj,i “ gp∇˚BjBk, Biq “

ÿ

l

gilΓ˚lkj .

From (36) and (38), the cubic form can now be written as446

Cijk “ Γ˚kj,i ´ Γkj,i.

Clearly, Cijk “ Cjik, i.e., it is symmetric with respective to its first two indices. When both ∇ and ∇˚447

are torsion-free, this implies that448

Γkj,i “ Γjk,i, Γ˚kj,i “ Γ˚jk,i,


then Cijk “ Cikj, which leads to C being totally symmetric in all the indices,449

Cijk “ Cikj “ Ckij “ Ckji “ Cjki “ Cjik.

Requiring that Cijk is totally symmetric imposes a compatibility condition between g and ∇, so450

that they form a Codazzi pair (see Simon, 2000), which generalizes the Levi-Civita coupling (whose451

corresponding cubic form Cijk ” 0). Lauritzen (1987) defined a statistical manifold pM, g,∇q to be a452

manifold M equipped with metric g and connection ∇ such that i) ∇ is torsion-free; ii) ∇g ” C is453

totally symmetric. Equivalently, a manifold has a statistical structure when the conjugate (with respect454

to g) ∇˚ of a torsion-free connection ∇ is also torsion-free. In this case, ∇˚g “ ´C, and the Levi-Civita455

connection ∇ “ p∇`∇˚q2.456

On a statistical manifold, one can define a one-parameter family of affine connections ∇pαq, called457

α-connections (α P R) [2]:458

∇pαq “ 1` α

2∇` 1´ α

2∇˚. (40)

Obviously, ∇p0q “ p∇ is the Levi-Civita connection, and the cubic form is given by ∇pαqg “ αC.459

The curvature/flatness of a connection ∇ is described by the Riemann curvature tensor R, defined460

as461

RpBi, BjqBk “ p∇Bi∇Bj ´∇Bj∇BiqBk.

Writing RpBi, BjqBk “ř

l RlkijBl and substituting (34), the components of the Riemann curvature tensor462

are463

Rlkijpxq “

BΓljkpxq

Bxi ´BΓl

ikpxqBxj `

ÿ

mΓl

impxqΓmjkpxq ´

ÿ

mΓl

jmpxqΓmikpxq.

By definition, Rlkij is antisymmetric when i Ø j. The covariant form of the Riemann curvature is464

Rlkij “ÿ

mglm Rm

kij.

When the connection is torsion-free, Rlkij is antisymmetric when i Ø j or when k Ø l, and symmetric465

when pi, jq Ø pl, kq. It is related to the Ricci tensor Ric by Rickj “ř

i,l Rlkijgil .466

In addition, it can be shown that the curvatures Rlkij, R˚lkij for the pair of conjugate connections467

∇,∇˚ satisfy468

Rlkij “ R˚lkij.

A connection is said to be flat when Rlkijpxq ” 0. So, ∇ is flat if and only if ∇˚ is flat. In this case, the469

manifold is said to be dually-flat, and the metric g takes on a particular form (to be discussed later).470

4.2. Divergence function and induced geometry471

A divergence function D : MˆMÑ Rě0 on a manifold M with respect to a local chart V Ď Rn is472

a C3 function satisfying473

(i) Dpx, yq ě 0,@x, y P V, with equality holding if and only if x “ y;474

(ii) Dipx, xq “ D,jpx, xq “ 0,@i, j P t1, 2, ¨ ¨ ¨ , nu;475

(iii) ´Di,jpx, xq is positive-definite.476

Here Dipx, yq “ Bxi Dpx, yq, D,ipx, yq “ ByiDpx, yq denote partial derivatives with respect to the i-th477

component of point x and of point y, respectively, and Di,jpx, yq “ BxiByjDpx, yq the second-order478

mixed derivative, and so on.479

On a manifold, divergence functions act as pseudo-distance functions that are nonnegative but480

need not be symmetric. Every divergence function induces a dualistic Riemannian structure, i.e.,481

statistical structure, which was first demonstrated by S. Eguchi (see [8]).482


Lemma 4.1. A divergence function induces a Riemannian metric g and a pair of torsion-free conjugate483

connections ∇,∇˚ given as484

gijpxq “ ´ Di,jpx, yqˇ

ˇ

x“y ,

Γij,kpxq “ ´ Dij,kpx, yqˇ

ˇ

ˇ

x“y,

Γ˚ij,kpxq “ ´ Dk,ijpx, yqˇ

ˇ

ˇ

x“y.

The Γij,k, Γ˚ij,k are torsion-free and are conjugate with respect to the induced metric gij. Hence, the485

divergence function D induces pM, g,∇,∇˚q, which is a statistical manifold (Lauritzen [12]).486

A popular divergence function is the Bregman divergence BΦpx, yq [6], which is associated to a487

strictly convex function Φ:488

BΦpx, yq “ Φpyq ´Φpxq ´ xy´ x, BΦpxqy, (41)

where BΦ “ rB1Φ, ¨ ¨ ¨ , BnΦs denotes the exterior derivative, and x¨, ¨yn denotes the canonical pairing of489

a vector x “ rx1, ¨ ¨ ¨ , xns P Rn and a covector u “ ru1, ¨ ¨ ¨ , uns P rRn (dual to Rn), i.e.,490

xx, uy “nÿ

i“1

xiui. (42)

Where there is no danger of confusion, the subscript n in x¨, ¨yn is often omitted. A basic fact in convex491

analysis is that the necessary and sufficient condition for a smooth function Φ to be strictly convex is492

BΦpx, yq ą 0, (43)

for all x ‰ y.493

Recall that when Φ is convex, its convex conjugate, rΦ : rV Ď rRn Ñ R, is defined through the494

Legendre transform:495

rΦpuq “ xpBΦq´1puq, uy ´ΦppBΦq´1puqq, (44)

with r

rΦ “ Φ and pBΦq “ pBrΦq´1. Since rΦ is also convex, by (43), we obtain the Fenchel inequality,496

Φpxq ` rΦpuq ´ xx, uy ě 0,

for any x P V, u P rV, with equality holding if and only if497

u “ pBΦqpxq “ pBrΦq´1pxq ÐÑ x “ pBrΦqpuq “ pBΦq´1puq, (45)

or, in component form,498

ui “BΦBxi ÐÑ xi “

BrΦBui

. (46)

Using conjugate variables, we can introduce the canonical divergence AΦ : V ˆ rV Ñ R` (and499

ArΦ : rV ˆV Ñ R`),500

AΦpx, vq “ Φpxq ` rΦpvq ´ xx, vy “ ArΦpv, xq.

They are related to the Bregman divergence (41) via the relation501

BΦpx, pBΦq´1pvqq “ AΦpx, vq “ BrΦppB

rΦqpxq, vq.


Though the Bregman divergence is not a metric, it satisfies a quadrilateral relation [23]: For any502

four points x, x1, x2, x3 P V,503

BΦpx, x1q ` BΦpx3, x2q ´ BΦpx, x2q ´ BΦpx3, x1q “ xx2 ´ x1, BΦpxq ´ BΦpx3qy .

As a special case, when x3 “ x1, BΦpx3, x1q “ 0, the above equality reduces to the Pythagorean504

(generalized cosine) relation among three points x, x1, x2 P V:505

BΦpx, x1q ` BΦpx1, x2q ´ BΦpx, x2q “ xx2 ´ x1, BΦpxq ´ BΦpx1qy .

This is the Pythagorean relation [3] for a dually-flat space. Using this relation, one can state506

minimization problems for divergence functions.507

The quadrilateral relation can be expressed in terms of the canonical divergence A as follows,508

AΦpx, uq ÀΦpy, vq ÁΦpx, vq ÁΦpy, uq “ xx´ y, v´ uy,

for any four points x, y P V, v, u P rV.509

Zhang [22] introduced the α-indexed family of Φ-divergence functions DpαqΦ on V ˆV,510

DpαqΦ px, yq “4

1´ α2

ˆ

1´ α

2Φpxq `

1` α

2Φpyq ´Φ

ˆ

1´ α

2x`

1` α

2y˙˙

. (47)

Furthermore, Dp˘1qΦ px, yq is defined by taking limαÑ˘1:511

Dp´1qΦ px, yq “ Dp1qΦ py, xq “ BΦpx, yq,

Dp1qΦ px, yq “ Dp´1qΦ py, xq “ BΦpy, xq.

Note that DpαqΦ px, yq satisfies the relation (called referential duality in [22,27]),512

DpαqΦ px, yq “ Dp´αqΦ py, xq,

that is, exchanging the two points in the directed distance amounts to α Ø ´α.513

4.3. Hessian manifolds and biorthogonal coordinates514

Applying Lemma 4.1 to the Bregman divergence BΦ induces the following metric,515

gijpxq “B2ΦpxqBxiBxj ,

and the pair of torsion-free conjugate connections,516

Γij,kpxq “ 0, Γ˚ij,kpxq “B3ΦpxqBxiBxjBxk .

In this case, M is dually-flat. This yields a Hessian manifold, where g takes the form of the Hessian of a517

strictly convex function Φ.518

Hessian manifolds enjoy a special status in information geometry, as they exhibit biorthogonal519

coordinates on M that are globally affine coordinates despite the nontrivial Riemannian (Hessian)520

metric on M.521

Consider the coordinate transform x ÞÑ u,522

Bi ”B

Bui“

ÿ

l

Bxl

Bui

B

Bxl “ÿ

l

FliBl ,


where the Jacobian matrix F is given by523

Fijpxq “Bui

Bxj , Fijpuq “Bxi

Buj,

ÿ

l

Fil Fl j “ δij, (48)

where δij is the Kronecker delta, which takes the value 1 when i “ j and 0 otherwise. If the new524

coordinate system u “ ru1, ¨ ¨ ¨ , uns (with components denoted by subscripts) is such that525

Fijpxq “ gijpxq, (49)

then the x-coordinate system and the u-coordinate system are said to be biorthogonal to each other,526

since, from the definition of the metric tensor (33),527

gpBi, B jq “ gpBi,ÿ

l

Fl jBlq “ÿ

l

Fl jgpBi, Blq “ÿ

l

Fl jgil “ δji .

In this case, we define528

gijpuq “ gpBi, B jq, (50)

which is equal to Fij, the Jacobian of the inverse coordinate transform u ÞÑ x. We also introduce the529

contravariant representation of the affine connection ∇ with respect to the u-coordinate system, and530

denote it by an unconventional notation Γrst , which is defined by531

∇BrBs “ÿ

tΓrs

t Bt;

similarly, Γ˚rst is defined by532

∇˚BrBs “

ÿ

tΓ˚rs

t Bt.

The covariant representation of the affine connections will be denoted by superscripted Γ and Γ˚,533

Γij,kpuq “ gp∇BiBj, Bkq, Γ˚ij,kpuq “ gp∇˚

BiBj, Bkq. (51)

The representation of the affine connections in the u-coordinate system (denoted by superscripts) and534

the x-coordinate system (denoted by subscripts) are related by535

Γrst puq “

ÿ

k

¨

˝

ÿ

i,j

Bxr

Bui

Bxs

BujΓk

ijpxq `B2xk

BurBus

˛

‚

BukBxt , (52)

and536

Γrs,tpuq “ÿ

i,j,k

Bxr

Bui

Bxs

Buj

Bxt

BukΓij,kpxq `

B2xt

BurBus. (53)

Similarly relations hold between Γ˚rst puq and Γ˚k

ij pxq, and between Γ˚rs,tpuq and Γ˚ij,kpxq.537

Analogous to (39), we have the following identity,538

B2xt

BusBur“BgrtpuqBus

“ Γrs,tpuq ` Γ˚ts,rpuq.

Therefore, with respect to biorthogonal coordinates, a pair of conjugate connections ∇,∇˚ satisfy,539

Γ˚ts,rpuq “ ´ÿ

i,j,k

girpuqgjspuqgktpuqΓij,kpxq, (54)


and540

Γ˚ tsr puq “ ´

ÿ

j

gjspuqΓtjrpxq. (55)

We now investigate conditions for the existence of biorthogonal coordinates on a Riemannian541

manifold pM, gq. From its definition (49), it can easily be shown that542

Proposition 4.2. A Riemannian manifold M with metric gij admits biorthogonal coordinates if and only ifBgij

Bxk543

is totally symmetric, i.e.,544

BgijpxqBxk “

BgikpxqBxj . (56)

In this case, M is Hessian, namely, its Riemannian metric gij can be written as the Hessian (second-derivative)545

of some convex function.546

That (56) is satisfied for biorthogonal coordinates is evident by virtue of (48) and (49). Conversely,547

given (56), there must be n functions uipxq, i “ 1, 2, ¨ ¨ ¨ , n, such that,548

BuipxqBxj “ gijpxq “ gjipxq “

BujpxqBxi .

The above identity implies that there exist a function Φ such that ui “ BiΦ and, by positive definiteness549

of gij, Φ would have to be a strictly convex function! In this case, the x- and u-variables satisfy (45),550

and the pair of convex functions, Φ and its conjugate rΦ, are related to gij and gij by551

gijpxq “B2ΦpxqBxiBxj ÐÑ gijpuq “

B2rΦpuq

BuiBuj.

It follows from Proposition 4.2 that a necessary and sufficient condition for a Riemannian manifold552

to admit biorthogonal coordinates it that its Levi-Civita connection is given by553

pΓij,kpxq ”12

ˆ

Bgik

Bxj `Bgjk

Bxi ´Bgij

Bxk

˙

“12Bgij

Bxk .

From this, the following can be shown:554

Corollary 4.3. A Riemannian manifold pM, gq admits a pair of biorthogonal coordinates x and u if and only if555

there exists a pair of conjugate connections ∇ and ∇˚ such that Γij,kpxq “ 0, Γ˚rs,tpuq “ 0.556

In other words, biorthogonal coordinates are affine coordinates for the dually-flat pair of557

connections. In fact, we can now define a pair of torsion-free connections by558

Γij,kpxq “ 0, Γ˚ij,kpxq “Bgij

Bxk ,

and show that they are conjugate with respect to g, that is, they satisfy (38). This means that we select559

an affine connection ∇ such that x is its affine coordinate system. From (53), when ∇˚ is expressed in560

u-coordinates,561

Γ˚rs,tpuq “ÿ

i,j,k

girpuqgjspuqBxk

But

BgijpxqBxk `

BgtspuqBur

“ÿ

i,j

girpuqˆ

´BgjspuqBut

gijpxq˙

`BgtspuqBur


“ ´ÿ

j

δrjBgjspuqBut

`BgtspuqBur

“ 0.

This implies that u is an affine coordinate system with respect to ∇˚. Furthermore,562

gijpuq “B2

rΦpuqBuiBuj

, Γij,kpuq “B3

rΦpuqBuiBujBuk

,

where rΦ is the convex conjugate of Φ. Therefore, biorthogonal coordinates are affine coordinates for a563

pair of dually-flat connections. On the manifold of parameterized probability density functions, if the564

x-coordinates are the natural parameters, then the u-coordinates are the expectations.565

5. Linking Information Geometry with Geometric Mechanics566

5.1. Symplectic structure on QˆQ induced from the divergence function D567

We will now establish the connection between information geometry and discrete geometric568

mechanics. The divergence function from information geometry can be viewed as a Type I generating569

function of a symplectic map, and in particular, it can be viewed as a discrete Lagrangian in the sense570

of discrete Lagrangian mechanics. More specifically, let the configuration manifold be the information571

manifold, i.e., Q “ M, and the discrete Lagrangian be the divergence function, i.e., Ld “ D. With572

this identification, we observe that the information geometric construction of symplectic structure on573

MˆM described below is nothing but the discrete symplectic structure on QˆQ given in (28) where574

the discrete Lagrangian Ld is replaced with the divergence function D.575

From information geometry, a divergence function D is given as a scalar-valued binary function on576

Q (of dimension n). We now view it as a unary function on QˆQ (of dimension 2n) that vanishes along577

the diagonal ∆Q Ă QˆQ. In this subsection, we investigate the conditions under which a divergence578

function can serve as a generating function of a symplectic structure on QˆQ. A compatible metric579

on QˆQ will also be derived. When restricted to the diagonal submanifold ∆Q, the skew-symmetric580

symplectic form will vanish, so ∆Q, which carries a statistical structure, is actually a Lagrangian581

submanifold.582

First, we fix a point x in the first slot, and a point y in the second slot of QˆQ – this results in583

two n-dimensional submanifolds of QˆQ that will be denoted, Qx “ Qˆ tyu » Q (with the y point584

fixed) and Qy “ txu ˆQ » Q (with the x point fixed), respectively. The canonical symplectic form ωx585

on the cotangent bundle T˚Qx is given by586

ωx “ÿ

i

dxi ^ dξi.

Given D, we define a map LD from Qy Ă QˆQ to T˚Qx, px, yq ÞÑ px, ξq, which is given by,587

LD : px, yq ÞÑ px,ÿ

i

Dipx, yqdxiq.

Recall that the comma indicates whether the variable being differentiated with respect to is in the first588

or second slot. It is easily checked that there exists a neighborhood of the diagonal ∆Q Ă QˆQ, such589

that the map LD is a diffeomorphism. In particular, the Jacobian of the map is given by590

˜

δij Dij0 Di,j

¸

,

which is nondegenerate in a neighborhood of the diagonal ∆Q.591


We calculate the pullback by LD of the canonical symplectic form ωx on T˚Qx to QˆQ:592

L˚D ωx “ L˚D pÿ

i

dxi ^ dξiq “ÿ

i

dxi ^ dDipx, yq

“ÿ

i

dxi ^ÿ

j

pDijpx, yqdxj `Di,jdyjq “ÿ

ij

Di,jpx, yqdxi ^ dyj.

Here,ř

ij Dijdxi ^ dxj “ 0, since by the equality of mixed partials, Dijpx, yq “ Djipx, yq always holds.593

Similarly, we consider the canonical symplectic form ωy “ř

i dyi ^ dηi on T˚Qy and define a594

map RD from QˆQ Ñ T˚Qy, px, yq ÞÑ py, ηq, which is given by595

RD : px, yq “ py,ÿ

i

D,ipx, yqdyiq.

Using RD to pullback ωy to QˆQ yields an analogous formula:596

R˚Dωy “ ´ÿ

ij

Di,jpx, yqdxi ^ dyj.

Therefore, based on canonical symplectic forms on T˚Qx and T˚Qy, we obtained the same597

symplectic form on QˆQ598

ωDpx, yq “ ´ÿ

ij

Di,jpx, yqdxi ^ dyj. (57)

Theorem 5.1. A divergence function D induces a symplectic form ωD (57) on QˆQ which is the pullback of599

the canonical symplectic forms ωx and ωy by the maps LD and RD ,600

L˚D ωy “ÿ

ij

Di,jpx, yqdxi ^ dyj “ ´R˚D ωx. (58)

With the symplectic form ωD given as above, it is easy to check that ωD is closed:601

dωD “ÿ

ijk

"

B3DBxkBxiByj dxk ^ dxi ^ dyj `

B3DBykBxiByj dyk ^ dxi ^ dyj

*

“ 0.

It was Barndorff-Nielsen and Jupp [5] who first proposed (57) as an induced symplectic form on602

QˆQ, apart from a minus sign; they called the divergence function D a york.603

The fact that this symplectic structure coincides with the one introduced in discrete mechanics604

should come as no surprise. The Qx and Qy submanifolds are related to the two ways of viewing605

Qˆ Q as a bundle over Q, depending on whether one chooses π1 : Qˆ Q Ñ Q, pq1, q2q ÞÑ q1 or606

π2 : QˆQ Ñ Q, pq1, q2q ÞÑ q2 as the bundle projection. Then, the maps LD , RD are, up to a sign, simply607

the discrete fiber derivatives FL˘d , where the discrete Lagrangian Ld is replaced by the divergence608

function D.609

5.2. Divergence as a Type I generating function610

As we have seen previously, symplectic maps are a natural way of describing the flow of611

Hamiltonian mechanics on the cotangent bundle T˚Q. We will now consider the characterization612

of symplectic maps in terms of generating functions, and in particular, we review three different613

parameterizations based on the classification given in Goldstein [10].614

Lemma 5.2. Given pq0, q1q P QˆQ, then pq0, p0q ÞÑ pq1, p1q on T˚Q is symplectic if and only if there exists615

S1 : QˆQ Ñ R such that616

p0 “ ´D1S1, p1 “ D2S1. (59)


To prove this, observe that617

dS1pq0, q1q “ pD1S1qdq0 ` pD2S1qdq1,

from which, we immediately obtain618

´dp0 ^ dq0 ` dp1 ^ dq1 “ 0 “ d2S1pq0, q1q “ dpD1S1q ^ dq0 ` dpD2S1q ^ dq1.

Identifying the corresponding terms yield (59).619

Type I generating functions S1 are linked with other types of generating functions via partial620

Legendre transforms. Fixing the first or second variable slot leads to, respectively, Type II or III621

generating functions, denoted S2, S3 respectively.622

Let H` be a submanifold, with local coordinates pq0, p1q, of Qˆ pT˚Qq, with local coordinates623

pq0, pq1, p1qq, where q1 is dependent on q0 and p1. Then pq0, p0q ÞÑ pq1, p1q on T˚Q is symplectic if and624

only if there exists S2 : H` Ñ R such that625

p0 “ D1S2, q1 “ D2S2. (60)

Likewise, let H´ be a submanifold, whose local coordinates are pp1, q0q, of pT˚Qq ˆQ with local626

coordinates ppq0, p0q, q1q where q0 is dependent on p0 and q1. Then pq0, p0q ÞÑ pq1, p1q on T˚Q is627

symplectic if and only if there exists S3 : H´ Ñ R such that628

q0 “ ´D1S3, p1 “ ´D2S3. (61)

In the case of discrete mechanics, the Type II generating function is denoted by H`pq0, p1q and629

the Type III generating function is denoted by H´pq1, p0q. We compute their exterior derivatives:630

dH`pq0, p1q “ D1H`pq0, p1qdq0 `D2H`pq0, p1qdp1, (62)

dH´pq1, p0q “ D1H´pq1, p0qdq1 `D2H´pq1, p0qdp0. (63)

From this, we obtain,631

0 “ d2H`pq0, p1q “ dpD1H`pq0, p1qdq0 `D2H`pq0, p1qdp1q (64)

“ dpp0dq0 ` q1dp1q “ dp0 ^ dq0 ` dq1 ^ dp1, (65)

0 “ d2H´pq1, p0q “ dpD1H´pq1, p0qdq1 `D2H´pq1, p0qdp0q (66)

“ dpp1dq1 ` q0dp0q “ dp1 ^ dq1 ` dq0 ^ dp0. (67)

Therefore, symplectic maps can be defined implicitly in terms of a Type II generating function632

H`pq0, p1q,633

q1 “ D2H`pq0, p1q, p0 “ D1H`pq0, p1q,

and a Type III generating function H´pq1, p0q,634

q0 “ D2H´pq1, p0q, p1 “ D1H´pq1, p0q.

More explicitly, these are related to the discrete Lagrangian Ldpq0, q1q, which is a Type I generating635

function, by the following partial Legendre transforms:636

H`pq0, p1q “ extq1txp1, q1y ´ Ldpq0, q1qu , (68)

H´pq1, p0q “ extq0txp0, q0y ´ Ldpq0, q1qu , (69)


or equivalently,637

H`pq0, p1q ´ xp1, q˚1 y ` Ldpq0, q˚1 q “ 0, (70)

H´pq1, p0q ´ xp0, q˚0 y ` Ldpq˚0 , q1q “ 0. (71)

The upshot of the above discussion is that p0, p1 are Legendre dual variables with respect to q0, q1,638

whereas in the fiberwise Legendre transform FL, it is pq0, v0q, pq1, v1qwhich are dual to pq0, p0q, pq1, p0q639

– the dual correspondence is q Ø p, instead of v Ø p. As before, the two discrete Legendre dualities640

are due to the two ways of viewing QˆQ as a bundle over Q.641

In the context of information geometry, H˘ is nothing but the partial Legendre transform of the642

divergence function Dpx, yq with respect to the first or second argument. Consider the Bregman643

divergence BΦpq0, q1q,644

BΦpq0, q1q ” Φpq1q ´Φpq0q ´ xBΦpq0q, q1 ´ q0y,

and view it as a discrete Lagrangian Ldpq0, q1q. Then, its partial Legendre transform with respect to q1,645

the Type II generating function H`, is646

H`pq0, p1q “

B

q1,BBΦpq0, q1q

Bq1

F

´ BΦpq0, q1q ,

which evaluates to647

H`pq0, p1q “ BΦpq1pq0, p1q, q0q,

where648

q1 “ pBΦq´1pp1 ` BΦpq0qq,

is obtained by solving649

p1 “BBΦpq0, q1q

Bq1“ BΦpq1q ´ BΦpq0q.

By substitution, we obtain,650

H`pq0, p1q “ BΦpBΦq´1pBΦpq0q ` p1q, q0q “ BrΦpBΦpq0q, BΦpq0q ` p1q.

Note that in this case, the Legendre dual of q1 is no longer p1 “ BΦpq1q as given by the fiberwise651

Legendre map, but is rather shifted by an amount BΦpq0q. It is interesting that H` still takes the form652

of B, as does Ld. This is a special property of taking the Bregman divergence as the generating function.653

5.3. D-divergence for decoupling L and H654

In geometric mechanics, Hamiltonian and Lagrangian dynamics represent one and the same655

dynamics – they are coupled; this is because Hpq, pq and Lpq, vq are related by the fiberwise Legendre656

transform FL “ pFHq´1 – in fact they are a Legendre pair. The conservation properties of the657

Hamiltonian approach with respect to the underlying symplectic geometry and the variational658

principles that arise in the Lagrangian and Hamilton–Jacobi theories reflect two sides of the same coin.659

TQ FL // T˚Q

QˆQ

Rq

ee

FLd

99

(72)

To appreciate this, we look at the interaction of three manifolds QˆQ, T˚Q and TQ. We take660

pqk, qk`1q to be the configuration variable q at successive time-step – it is the dynamical equation that661

governs the evolution from qk to qk`1. The Hamiltonian dynamics, which is encoded in the preservation662


of ωcan of T˚Q, governs discrete Hamiltonian flow Hpqk, pkq, through a Type I generating function663

Ldpqk, qk`1q. On the other hand, the Lagrangian flow, which is governed by the retraction map Rq, such664

as the Dirichlet-to-Neumann map induced by Jacobi’s solution Sq0pqq to the Hamilton–Jacobi equation.665

Those two dynamic updates qk Ñ qk`1 need not be identical. In mechanics, the Hamiltonian energy666

conservation system and the Lagrangian extremization system leads to one and the same dynamics,667

precisely because Lpqk, vkq and Hpqk, pkq are linked through the fiberwise Legendre transform FL at qk:668

Lpqk, vkq ` Hpqk, pkq ´ xvk, pkyqk ” 0.

In other words, L and H are perfectly coupled – with no duality gap.669

Information geometry, on the other hand, starts with a divergence (or contrast) function Dpq, v, pq670

on TQ‘ T˚Q, which measures the discrepancy between the two systems. Given Hpq, pq on T˚Q and671

Lpq, vq on TQ, we write672

Dpq, v, pq “ Lpq, vq ` Hpq, pq ´ xv, pyq.

Theorem 5.3. Let Lpq, vq and Hpq, pq be strictly convex functions, defined on TQ and T˚Q in terms of the673

variables pq, vq and pq, pq, respectively. Then, for the following statements, any two imply the rest:674

(i) D “ 0;675

(ii) Hpq, ¨q and Lpq, ¨q are (fiberwise) convex conjugate (Legendre dual) to each other;676

(iii) p “ BLBv “ FLpvq ;677

(iv) v “ BHBp “ FHppq .678

When D “ 0, ωLpξL, ¨q “ ιξL ωL “ dEL with ξL “ p 9q, 9vq, and679

ELpq, vq “ÿ

i

vi BLpq, vqBvi ´ Lpq, vq.

Then,680

dEL “ÿ

i

¨

˝

ÿ

j

B2LBqiBvj vj ´

BLBqi

˛

‚dqi `ÿ

ij

vj B2L

BviBvj dvi.

The Euler–Lagrange equations are equivalent to681

ξL “

ˆ

BELBq

,BELBv

˙

.

Our insight here is that D does not have to vanish identically. The consequence is that we do not682

require the Lagrangian dynamics (extremization dynamics) and Hamiltonian dynamics (conservation683

dynamics) to be coupled; they will be allowed to evolve independently. The function D allows us to684

study fiberwise symplectomorphisms of Dirac manifolds.685

Let us consider the case that (ii) holds, i.e., Hpq, ¨q and Lpq, ¨q are Legendre duals to each other.686

Then, the canonical divergence D can be written as the Bregman divergence BH and BL, after applying687

the fiberwise Legendre map FL or FH,688

p “BLpq, vqBv

ðñ v “BHpq, pqBp

.

This implies that,689

BLpq, v0, v1q ” Dˆ

q, v0,BLpq, vqBv1

˙

“ Lpq, v1q ´ Lpq, v0q ´

B

BLpq, v0q

Bv0, v1 ´ v0

F

,

BHpq, p0, p1q ” Dˆ

q,BHpq, p0q

Bp0, p1

˙

“ Hpq, p1q ´ Hpq, p0q ´

B

BHpq, p0q

Bp0, p1 ´ p0

F

,


and they satisfy,690

BLpq, v0, v1q “ BHpq, p1, p0q.

This is the reference-representation biduality [23,27], which is satisfied whenever L and H are Legendre691

duals of each other.692

5.4. Variational error analysis693

Recall that we previously defined the exact discrete Lagrangian LEd (16), which is related to Jacobi’s694

solution of the Hamilton–Jacobi equation. The significance of the exact discrete Lagrangian is that it695

generates the exact discrete time flow of a Lagrangian system, but in general it cannot be computed696

explicitly. Instead, a computable discrete Lagrangian Ld is used instead to construct a discretization of697

Lagrangian mechanics, and it induces the discrete Lagrangian map FLd .698

Since discrete variational mechanics is expressed in terms of discrete Lagrangians, and the699

exact discrete Lagrangian generates the exact flow map of a continuous Lagrangian system, it is700

natural to ask whether we can characterize the order of accuracy of the Lagrangian map FLd as701

an approximation of the exact flow map, in terms of the extent to which the discrete Lagrangian702

Ld approximates the exact discrete Lagrangian LEd . This is indeed possible, and is referred to as703

variational error analysis. Theorem 2.3.1 of [15] shows that if a discrete Lagrangian Ld approximates the704

exact discrete Lagrangian LEd to order p, i.e., Ldpq0, q1; hq “ LE

d pq0, q1; hq Òphp`1q, then the discrete705

Hamiltonian map, FLd : pqk, pkq ÞÑ pqk`1, pk`1q, viewed as a one-step method, is order p accurate.706

As mentioned above, the divergence function D from information geometry can serve as a Type I707

generating function of a symplectic map, and hence it can be viewed as a discrete Lagrangian in the708

sense of discrete Lagrangian mechanics. A divergence function also generates the Riemannian metric709

and affine connection structures on the diagonal manifold (Lemma 4.1), in addition to generating the710

symplectic structure on MˆM. Viewed in this way, a natural question is to what extent can we view711

the divergence function as corresponding to the exact Lagrangian flow of an associated continuous712

Lagrangian. We can show that713

Theorem 5.4. The exact discrete Lagrangian LEd pqp0q, qphq, hq associated with the geodesic flow, with respect714

to the induced metric g, can be approximated by a divergence function Dpqp0q, qphqq up to third order Oph3q715

accuracy,716

LEd pqp0q, qphq, hq “ hLpqp0q, vp0qq `Dpqp0q, qphqq Òph3q,

if and only if M is a Hessian manifold, i.e., D is the Bregman divergence BΦ, for some strictly convex function717

Φ.718

Proof. Let us expand the exact discrete Lagrangian to obtain,719

LEd pqp0q, qphq, hq “ hLpq, 9qq `

h2

2

ˆ

BLBqpq, 9qq `

BLB 9qpq, 9qq ¨ :q

˙

Òph3q

“ hLpq, vq `h2

2

ˆ

BLBqi vi `

BLBvi ai

˙

Òph3q, (73)

where qp0q “ q, v “ 9qp0q, a “ :qp0q.720

From the definition of a divergence function:721

D,jpq, qq “ 0.

Differentiating with respect to q,722

0 “B

Bqi D,jpq, qq “ Di,jpq, qq `D,ijpq, qq,


so723

Di,jpq, qq “ ´D,ijpq, qq.

Differentiating with respect to q again,724

B

Bqk D,ijpq, qq “ Dk,ijpq, qq `D,ijkpq, qq.

Observe that the left-hand side is the metric induced by the divergence function,725

B

Bqk D,ijpq, qq “ ´Di,jpq, qq “ gijpqq.

Expanding Dpq, q1q around q “ qp0q for q1 “ qphq:726

q1 “ q` vh`12

ah2 `Oph3q,

we obtain727

Dpq, q1q “h2

2gijpqq `

h3

4gijpqqpviaj ` vjaiq `

h3

6Γijkpqqvivjvk `Oph4q,

where728

Dpq, qq “ 0, D,ipq, qq “ 0, D,ijpq, qq ” gijpqq “ ´Di,jpq, qq, D,ijkpq, qq ” Γijkpqq.

Clearly, gij “ gji, and729

Γijk “ Γikj “ Γkij “ Γkji “ Γjki “ Γjik.

Comparing the corresponding terms in powers of h, we obtain,730

Lpq, vq “12

gijvivj, (74)

vi BLBxi pq, vq ` ai BL

Bvi pq, vq “12

gijpaivj ` ajviq `13

Γijkvivjvk. (75)

Substituting (74) into (75) yields731

Bgij

Bqk “ Γijk,

where,732

Bgij

Bqk “Bgik

Bqj .

This, according to Proposition 4.2, demonstrates that the manifold M is Hessian, and hence dually-flat.733

So, for the expansions to agree to Oph3q, the inducing divergence function D must be the Bregman734

divergence BΦ.735

6. Summary736

In this paper, we show the differences and connections between geometric mechanics and737

information geometry in canonically prescribing differential geometric structures on a smooth manifold738

Q. The Legendre transform plays crucial roles in both; however, they serve very different purposes.739

In geometric mechanics, the fiberwise Legendre map serves to link the cotangent bundle T˚Q with740

tangent bundle TQ, whereas in information geometry, the Legendre transform relates the pair of741

biorthogonal coordinates, which are special coordinates on a dually-flat manifold Q. More specifically,742

FL (or its inverse FH) is invoked to establish the isomorphism between T˚Q Ø TQ in geometric743

mechanics, whereas in information geometry, a Hessian metric g built upon a convex function on Q is744


used for the correspondence between two coordinate systems on Q, and also for potentially (but not745

necessarily) establishing a correspondence between TQ and T˚Q.746

The link between information geometry and discrete mechanics is much stronger when one747

considers the discrete version (as opposed to the traditional, continuous version) of geometric748

mechanics. Both endow a symplectic structure ωˆ on QˆQ, through the use of a discrete Lagrangian749

Ld in the case of geometric mechanics and a divergence function D in the case of information geometry750

– in fact they are both Type I generating functions for inducing ωˆ on Q ˆ Q via pullback from751

the canonical symplectic structure ωcan on T˚Q. Using the Legendre transform, Type II generating752

functions can be constructed, which lead to the (right) discrete Hamiltonian H` in geometric mechanics753

and to the dual divergence function in information geometry.754

Our analyses draw a distinction between the fiberwise Legendre map (which is used in continuous755

mechanics setting), the Legendre transform between biorthogonal coordinates (which is used in756

information geometry), and the Legendre transform between Type I and Type II generating functions757

(which is used in the setting of both discrete geometric mechanics and information geometry). The758

distinctions are more prominent when one considers the Pontryagin bundle TQ‘ T˚Q. There, we759

can construct a divergence function that actually measures the duality gap between the Lagrangian760

function and the Hamiltonian function that generate a pair of (forward and backward) Legendre maps.761

In so doing, we demonstrate that information geometry can be viewed as an extension of geometric762

mechanics based on Dirac mechanics and geometry, with a full-blown duality between the Lagrangian763

and Hamiltonian components.764

7. Discussion and Future Directions765

Noda [16] showed that, with respect to the symplectic structure ωˆ on QˆQ, the Hamiltonian766

flow of the canonical divergence A induces geodesic flows for ∇ and ∇˚. He interpreted biorthogonal767

coordinates as a single coordinate system on QˆQ, in a way that is consistent with treating A as the768

Type I generating function on QˆQ. It remains unclear how the resulting Hamiltonian flow is related769

to dynamical flow on the Dirac manifold.770

In another related work, Ay and Amari [4] sought to characterize the canonical form of divergence771

functions for general (non dually-flat) manifolds. They investigated the retraction map Rq : tqu ˆQ Ñ772

TqQ which we discussed in Section 2.5, and used the exponential map associated to any torsion-free773

affine connection ∇ on TQ. This approach, based on parallel transport, in essence generates a semispray774

on TQ, and is quite different from characterizing the dynamics using the Hamilton flow on T˚Q. Note775

that even though one may define a symplectic structure (through pullback) on TQ as well, [4] treats776

the semispray on TQ as the primary geometric object. Future research will clarify its relation to our777

approach, which is based on defining a symplectic structure on QˆQ directly.778

Finally, comparing information geometry with geometric mechanics may shed light on universal779

machine learning algorithms. In machine learning or state estimation applications, we wish to have the780

estimated distribution be influenced by the observations, so that the estimated distribution eventually781

becomes consistent with the observed data. Let txiu denote the sequence of predictions by (possibly782

a series of) model distributions X , and let tyiu denote the actual data generated by an unknown783

distribution Y that we are trying to estimate. In practice, the divergence functions are constructed784

so that the pseudo-distance between two distributions X and Y can be computed using only complete785

information about X and samples from Y . As such, we can measure the mismatch between the current786

prediction xi and the actual data yi using Dpxi, yiq, since the asymmetry in the definition of D is such787

that we only require samples yi from the true but unknown distribution. So, adding a momentum term788

to ensure gentle change in model predictions, a possible choice of a discrete Lagrangian for generating789

the discrete dynamics for the machine learning application might be given by790

Lpxi, xi`1q “ Dpxi, xi`1q `Dpxi`1, yi`1q,


where the first term can be interpreted as the action associated with the kinetic energy, and the second791

term is the action associated with the potential energy. By construction, the term Dpxi, yiq vanishes792

when the prediction xi is consistent with the actual observation yi, and it is positive otherwise, so the793

term Dpxi, yiq can be viewed as a potential energy term that penalizes mismatch between the estimated794

distribution and the observational data. Our variational error analysis may thus shed light on an795

asymptotic theory of inference where sample size N Ñ8 is akin to discretization step h Ñ 0.796

The link between geometric mechanics and information geometry, as revealed through our797

present investigation, is still rather preliminary. The possibility of a unified mathematical framework798

for information and mechanics is intriguing and remains a challenge for future research.799

Acknowledgements800

We thank the anonymous reviewers for helping to improve this paper. The first author is801

supported by NSF grants CMMI-1334759 and DMS-1411792. The second author is supported by802

DARPA/ARO Grant W911NF-16-1-0383.803

804

1. Abraham, R.; Marsden, J. Foundations of mechanics; Benjamin/Cummings Publishing Co. Inc. Advanced805

Book Program: Reading, Mass., 1978. Second edition, revised and enlarged, With the assistance of Tudor806

Ratiu and Richard Cushman.807

2. Amari, S. Differential-geometrical methods in statistics; Vol. 28, Lect. Notes Stat., Springer-Verlag, New York,808

1985; pp. v+290.809

3. Amari, S.; Nagaoka, H. Methods of information geometry; Vol. 191, Translations of Mathematical Monographs,810

American Mathematical Society, Providence, RI; Oxford University Press, Oxford, 2000; pp. x+206.811

Translated from the 1993 Japanese original by Daishi Harada.812

4. Ay, N.; Amari, S. A novel approach to canonical divergences within information geometry. Entropy 2015,813

17, 8111–8129.814

5. Barndorff-Nielsen, O.E.; Jupp, P.E. Yorks and symplectic structures. Journal of Statistical Planning and815

Inference 1997, 63, 133–146.816

6. Bregman, L.M. The relaxation method of finding the common point of convex sets and its application817

to the solution of problems in convex programming. USSR Computational Mathematics and Physics 1967,818

7, 200–217.819

7. Eguchi, S. Second order efficiency of minimum contrast estimators in a curved exponential family. Ann.820

Stat. 1983, 11, 793–803.821

8. Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647.822

9. Fei, T.; Zhang, J. Interaction of Codazzi couplings with (para)-Kähler geometry. Results Math. 2017. In823

press.824

10. Goldstein, H. Classical mechanics, second ed.; Addison-Wesley Publishing Co., Reading, Mass., 1980; pp.825

xiv+672. Addison-Wesley Series in Physics.826

11. Lall, S.; West, M. Discrete variational Hamiltonian mechanics. J. Phys. A 2006, 39, 5509–5519.827

12. Lauritzen, S. Statistical manifolds. In Differential Geometry in Statistical Inference; Amari, S.;828

Barndorff-Nielsen, O.; Kass, R.; Lauritzen, S.; Rao, C., Eds.; IMS: Hayward, CA, 1987; Vol. 10, IMS829

Lect. Notes, pp. 163–216.830

13. Leok, M.; Ohsawa, T. Variational and Geometric Structures of Discrete Dirac Mechanics. Found. Comput.831

Math. 2011, 11, 529–562.832

14. Marsden, J.E.; Ratiu, T.S. Introduction to mechanics and symmetry, second ed.; Vol. 17, Texts in Applied833

Mathematics, Springer-Verlag, New York, 1999; pp. xviii+582. A basic exposition of classical mechanical834

systems.835

15. Marsden, J.; West, M. Discrete mechanics and variational integrators. Acta Numer. 2001, 10, 317–514.836

16. Noda, T. Symplectic structures on statistical manifolds. J. Aust. Math. Soc. 2011, 90, 371–384.837

17. Rockafellar, R.T. Convex analysis; Princeton Mathematical Series, No. 28, Princeton University Press,838

Princeton, N.J., 1970; pp. xviii+451.839


18. Shima, H. The geometry of Hessian structures; World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ,840

2007; pp. xiv+246.841

19. Tao, J.; Zhang, J. Transformations and coupling relations for affine connections. Differ. Geom. Appl. 2016,842

49, 111–130.843

20. Yoshimura, H.; Marsden, J. Dirac structures in Lagrangian mechanics Part II: Variational structures. J.844

Geom. Phys. 2006, 57, 209–250.845

21. Yoshimura, H.; Marsden, J. Dirac structures in Lagrangian mechanics Part I: Implicit Lagrangian systems.846

J. Geom. Phys. 2006, 57, 133–156.847

22. Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195.848

23. Zhang, J. Dual scaling of comparison and reference stimuli in multi-dimensional psychological space. J.849

Math. Psych. 2004, 48, 409–424.850

24. Zhang, J.; Matsuzoe, H. Dualistic Riemannian Manifold Structure Induced from Convex Functions. In851

Advances in Applied Mathematics and Global Optimization: In Honor of Gilbert Strang; Gao, D.; Sherali, H., Eds.;852

Springer: Boston, MA, 2009; pp. 437–464.853

25. Zhang, J.; Li, F. Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence854

Functions. In Geometric Science of Information, Paris, France, August 28-30, 2013. Proceedings; Nielsen, F.;855

Barbaresco, F., Eds.; Springer: Berlin, Heidelberg, 2013.856

26. Zhang, J. Divergence Functions and Geometric Structures They Induce on a Manifold. In Geometric Theory857

of Information; Nielsen, F., Ed.; Springer, 2014; pp. 1–30.858

27. Zhang, J. Reference duality and representation duality in information geometry. AIP Conference Proceedings859

2015, 1641, 130–146.860

c© 2017 by the authors. Submitted to Entropy for possible open access publication under the terms and conditions861

of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).862

http://creativecommons.org/licenses/by/4.0/.

semantic scholar › a5e5 › ac9771d2e3c6...version september 9, 2017 submitted to entropy 3 of 31...

Documents