4 Hamiltonian Mechanics

We begin by exporing the properties of the Legendre transform, which relates (convex) Lagrangians and Hamiltonians. We next introduce Hamilton’s equations, with some optional perspective from differential geometry which yield some of the most mathematically satisfactory / beautiful treatment of the topic.

Key takeaways of this part are:

The Hamiltonian \(H(q, p, t)\) is the Legendre transform of \(\mathcal L(q, \dot q, t)\) fixing \(q, t\); fixing \(q, t\), the conjugate variables \(\dot q\) and \(p\) are functions of each other by \(\nabla_{\dot q} \mathcal L = p\) and \(\dot q = \nabla_p H\) (lemma 4.1). The Jacobian \(J_{p\to q}\) is the Hessian.
When the \(\sup\) is dropped in the Legendre transform, the two variables are always understood to obey an implicit equation.
On strictly convex functions, the Legendre transform preserves convexity, and is involutary.
The scalar Hamiltonian function \(H(q, p, t)\in \mathbb R\) generates the flow of time in phase space, or time-translation symmetry.
Converse to Noether’s theorem: every scalar function on phase space generates a flow corresponding to its continuous symmetry:

Legendre transform

The Legendre transform is a general transform of functions which interactive particularly well with convex functions.

Definition 4.1 (Legendre transform) Given a function \(f:\mathbb R^n\to \mathbb R\), its Legendre transform \(\mathbb L f: \mathbb R^n\to \mathbb R\) is \[ (\mathbb L f)(p) = \sup_x \, \langle x, p\rangle- f(x) \]

Convex functions

Convex functions are characterized by \[ f(\lambda x + \bar \lambda y) \leq \lambda f(x) + \bar \lambda f(y), \quad \lambda \in [0, 1], \bar \lambda = 1 - \lambda \] assume they’re well-behaved enough, they’re equivalently defined by having positive (semi) definite second-derivatives (Hessian) at all points: \[ (\mathcal H_x f)(\forall x) \geq 0, \quad (\mathcal H_x f)_{ij}(x) = (\partial_{x_i x_j}^2 f)(x) \] Here \(A\geq 0\) for matrix \(A\) is understood as all eigenvalues of \(A\) being nonnegative. Convexity is further equivalent to \[ \forall x, y: f(y) \geq \nabla f(x) \cdot (y - x) \] The function \(f\) is strictly convex when the inequalities are strict.

Proposition 4.1 The gradient map \(x\mapsto (\nabla f)(x)\) is invertible if \(f\) is strictly convex.

Proof: For \(x\neq y\), strict convexity implies \[\begin{align} \nabla f(x) \cdot (y-x) < f(y) - f(x), \quad \nabla f(y) \cdot (x-y) < f(x) - f(y) \end{align}\] Suppose for contradiction that \(\nabla f(x) = \nabla f(y)\), substituting \(\nabla f(y)=\nabla f(x)\) in the last equation and multiplying both sides by \(-1\) yields the contradiction \[ \nabla f(x)\cdot (y-x) > f(y) - f(x) \]

Properties of the Legendre transform

We now show that the Legendre transform is well-behaved for strictly convex functions.

Assuming \(f\) well-behaved, the extremality condition is \[ \nabla_x \langle x, p\rangle- f(x) = p - \nabla_x f(x) = 0 \iff p = \nabla_x f \] This means that the \(\sup\) may be dropped when it is understood that \(x\) is an implict function of \(p\) via the equation \(p = \nabla_x f\) : \[ (\mathbb L f)(p=\nabla_x f) = x\cdot p - f(x) \]

The following lemma will prove the Hamilton equation \(\dot q = \nabla_p H\).

Lemma 4.1 (two-way relation between x and p) Given a Legendre transform \[ g(p) = (\mathbb L f)(p) = \sup_x p\cdot x - f(x) \] Then \(p = \nabla_x f \iff x = \nabla_p g\).

Proof: Assuming \(p = \nabla_x f\), then we can drop the \(\sup\) in the Legendre transform, yielding \[ g(p) = p\cdot x - f(x) \] Take the gradient w.r.t. \(p\) on both sides; note that we need to invoke the chain rule since \(p, x\) are related: \[\begin{align} \nabla_p g(p) &= \nabla_p (p\cdot x - f(x)) = x + J_{p\to x} p - J_{p\to x} (\nabla_x f)_{=p} = x \end{align}\]

Proposition 4.2 (Jacobian of Legendre transform) The Jacobian of the Legendre transform \(x\to p\) is the Hessian of \(f\) \[ J_{x\to p} = d_x p = d_x \left(\nabla_x f\right) = \mathcal H_x f \]

Proposition 4.3 (convexity invariance) Suppose \(f(x)\) is convex strictly convex so \(\mathcal H_x f> 0\) (positive-definite) everywhere, then \(\mathbb L f\) is convex, i.e. \(\mathcal H_p (\mathbb L f)>0\).

Proof: Computing the Hessian explicitly and recognize \(\nabla_x f = p\) \[\begin{align} \mathcal H_p(\mathbb L f) &= d_p [\nabla_p (\mathbb L f)] = d_p \left[ x + J_{p\to x} p - J_{p\to x}\nabla_x f \right] \\ &= d_p x = J_{p\to x} = J_{x\to p}^{-1} = \left(\mathcal H_x f\right)^{-1} \end{align}\] Convexity follows from the convexity of \(\mathcal H_x f> 0\).

Proposition 4.4 (involutary) For convex \(f\), the Legendre transform is involutary \[ \mathbb L \left[(\mathbb L f)(p)\right] (x) = f(x) \]

Proof: Unroll the definition \[\begin{align} \mathbb L \left[(\mathbb L f)(p)\right] (x) &= \mathbb L \left[p\mapsto xp - f(x)\right] (x)\\ &= xp - \left[p\mapsto xp - f(x)\right]\, p = xp - xp - f(x) = f(x) \end{align}\] Note the dependence here: fixing \(x\), we can fix \(p=\nabla_x f\), which in turn determines \(x\) in the inner Legendre transform.

Hamilton’s equations

Definition 4.2 (Hamiltonian) Given the Lagrangian \(\mathcal L(q, \dot q, t)\), the Hamiltonian is the Legendre transform of \(\dot q\mapsto p\) holding \(q\) fixed \[ H(q, p, t) = p\cdot \dot q - \mathcal L(q, \dot q(p), t) \] where \(p = \nabla_{\dot q} \mathcal L\) and \(\dot q(p)\) is the implicit inverse of this equation.

Theorem 4.1 (Hamilton's equations)

The following equations are equivalent to the Euler-Lagrange equations 2.2 when the Lagrangian \(\mathcal L(q, \dot q, t)\) is strictly convex in in \(\dot q\): \[\begin{aligned} \dot p = -\nabla_q H, \quad \dot q = \nabla_p H \end{aligned}\]

Collecting \(\xi = q \oplus p\) i.e. \((\xi_1, \xi_2, \dots, \xi_{2n} = (q_1, p_1, q_2, \dots q_n, p_n)\) and write \(H(q, p, t) = H(\xi, t)\), we obtain the compact expression \[ \partial_{t} \xi = \Gamma\, \nabla_\xi H, \quad \Gamma = \bigoplus \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} \tag{4.1} \]

Proof: Take care that \(\dot q, p\) are implicit functions of each other and recognize \(p=\nabla_{\dot q}\mathcal L\); holding \(q\) fixed: \[\begin{align} \nabla_p H(q, p, t) &= \nabla_p \left[p\cdot \dot q - \mathcal L(q, \dot q(p), t)\right] \\ &= \dot q + J_{p\to \dot q} p - J_{p\to \dot q}(\nabla_{\dot q}\mathcal L) = \dot q \end{align}\] For the second equation, we’re holding \(p\) fixed, but \(\dot q\) is still dependent upon \(q\), then \[\begin{align} \nabla_q H(q, p, t) &= \nabla_q \left[ p\cdot \dot q - \mathcal L(q, \dot q(p), t) \right] \\ &= J_{q\to \dot q}p - \nabla_q \mathcal L - J_{q\to \dot q} (\nabla_{\dot q}\mathcal L) = - \nabla_q \mathcal L \\ \dot p &= d_t \nabla_{\dot q}\mathcal L = \nabla_q \mathcal L \end{align}\] the last equation holds by the Euler-Lagrange equations.

Differential geometry perspective

Hamilton’s equations \(\partial_{t} \xi = \Gamma \, \nabla_\xi H\) is the component representation of the following implicit equation which defines the vector field \(X_H\) generating the time-translation of the system: \[ \omega(X_H, \, \cdot\, ) = dH \] Here \(\omega= dq \wedge dp = \sum_j dq_j\wedge dp_j\) is the symplectic form represented by \(\Gamma\), and \(dH\) is the differential of \(H\).

How is this an implicit equation? Both sides of the equation consumes a vector field to output a scalar, and \(X_H\) is implicitly defined to satisfy the equation.

We next proceed to showing that this is, in fact, equivalent to Hamilton’s equations.

Remark (differential geometry basics). Given a chart \(x=(x_1, \cdots, x_n)\) on a manifold \(M\) (like fixing a basis), a vector field \(X\) can be understood as a directional derivative operation that consumes a scalar function \(f\) and outputs \(X\, f\) (read as \(X\) acting on \(f\). The value of \((X\, f)(x)\) encodes how much \(f\) changes along the direction of the vector field \(X\) at location \(x\). The representation of \(X\) in this chart \(x\) is then \[ X = \sum_{j=1}^n X_j \partial_{j} \] where each \(X_j\) is a scalar function and \(\partial_{j} = \partial_{x_j}\) denotes the partial differentiation operation in the \(j\)-th variable; in this representation, the action of \(X\) can be locally written in coordinate form as \[ (X\, f)(x) = \sum_{j=1}^n X_j \partial_{j} f(x) = (X_j)\cdot \nabla f(x) \] Here \((X_j)\) is understood as a vector denoting the “direction”; in this sense the chart \(x\) also fixes a basis \(\partial_{1}, \cdots \partial_{n}\) for the space of vector fields (tangent bundle). Given a scalar function \(f\), its differential \(df\) consumes a vector field and outputs a scalar (function) \[ df\, X = X\, f \] We can similarly consider a basis representation for the differentials (which live on the cotangent bundle) with the notation \(d x_1, \cdots, dx_n\) \[ df = \sum_{j=1}^N f_j dx_j, \quad dx_j\, \partial_{i} = \delta_{ij} \]

The last piece we need is the wedge product: assuming a basis \(B=\{\partial_{p}, \partial_{q}\}\) (\(p\), \(q\) are univariables), the symplectic form \(\omega = dq \wedge dp\) consumes two arguments in \(B\) and outputs a number subject to the following rules, in addition to being linear in all its arguments: \[ (dq \wedge dp)(\partial_{a}, \partial_{b}) = \begin{cases} 1 & (\partial_{a}, \partial_{b}) = \partial_{q}, \partial_{p} \\ -1 & (\partial_{a}, \partial_{b}) = \partial_{p}, \partial_{q} \\ 0 & \text{otherwise}. \end{cases} \]

Back to Hamilton’s equation \(\omega(X_H, \cdot) = dH\). Fix the basis \(\{\partial_{q}, \partial_{p}\}\) (the order matters!) and consider two vector fields \(X, Y\) with components \[ X = X_q \partial_{q} + X_p \partial_{p}, \quad Y = Y_q \partial_{q} + Y_p \partial_{p} \] The representation \(\Gamma\) of \(\omega\) is read from the equation \[\begin{align} \omega(X, Y) &= X_qY_q \omega(\partial_{q}, \partial_{q})_{=0} + X_p Y_q \omega(\partial_{p}, \partial_{q})_{=-1} + X_qY_p \omega (\partial_{q}, \partial_{p})_{=1} + X_pY_p \omega (\partial_{p}, \partial_{p})_{=0} \\ &= X_qY_p - X_p Y_q = \begin{pmatrix} X_q \\ X_p \end{pmatrix}^T \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} \begin{pmatrix} Y_q \\ Y_p \end{pmatrix} \implies \Gamma = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} \end{align}\] The implicit equation holds for all vector fields \(Y\) (since both sides of the equation consume a vector field and outputs a scalar function) \[ \omega (X_H, Y) = dH\, Y = Y\, H \implies (X_H)^T \Gamma Y = Y_q \partial_{q} H + Y_p \partial_{p} H = (\nabla H)^T Y, \quad \forall Y \] where we have identified \(X_H, Y\) with their \(\partial_{q}, \partial_{p}\) coordinate representations on the RHS. Solving for \(X_H\) yields \[ (X_H)^T \Gamma = (\nabla H)^T \iff \nabla H = ((X_H)^T \Gamma)^T = -\Gamma X_H \iff X_H = \Gamma \nabla_H \] This is exactly the component expression of Hamilton’s equations we have derived.

Algebra of Poisson brackets

From now on, we work with a system with \(N\) degrees of freedom with phase space of dimension \(2N\). Recall the definition of \(\xi\) and \(\Gamma\) in equation (4.1).

Definition 4.3 (Poisson bracket) The Poisson bracket between two smooth scalar functions is \[ \left[{A}, {B}\right]_\xi = \nabla_q A \cdot \nabla_p B - \nabla_p A \cdot \nabla_q B = (\nabla_\xi A)^T \Gamma (\nabla_\xi B) \] We also use the derivative map \(D_A: B\mapsto \left[{A}, {B}\right]_\xi\) to denote partial application of the Poisson bracket.

The quantum analogue of the following relation are the canonical commutation relations between \(\hat x, \hat p\). They are the true foundational axioms which specify a theory.

Definition 4.4 (fundamental Poisson brackets) \(\left[{q_j}, {q_k}\right]_\xi = \left[{p_j}, {p_k}\right]_\xi = 0, \left[{q_j}, {p_k}\right]_\xi = \delta_{ij}\). Equivalently, these quantities are “probing” the entries of \(\Gamma\) \[ \left[{\xi_j}, {\xi_k}\right]_\xi = \Gamma_{jk} \]

Using the Poisson brackets, Hamilton’s equations (theorem 4.1) can be rewritten as \[ \partial_{t} \xi = -D_H \xi \implies \xi(t) = e^{- t D_H} \xi(0). \] The vector field \(D_H\) is also known as the symplectic gradient of \(H\). The time-evolution operator for time \(t\) is \(\xi \mapsto e^{-tD_H}\). The following result shows the compatibility of the Jacobian map with the exponential.

Proposition 4.5 (Jacobian of an exponential transform) Given an exponential map \(\xi \mapsto e^{-\lambda T}\xi\) (here \(\xi\) may be an operator instead of a constant matrix!), assuming convergence we obtain \(J_{\xi \mapsto e^{-\lambda T \xi}} = e^{-\lambda J_{\xi \to T\xi}}\).

Proof: Note that \(J_{T+S} = J_T+J_S\), and \(J_{TS}=J_TJ_S\). Since the exponential is defined as a power series using addition and multiplication, we obtain \(J_{\exp(T)} = \exp(J_T)\).

Theorem 4.2 (evolution of scalar under flow generated by another scalar) Given two scalar functions \(C, G\) (observables) and \(\xi(\lambda)=e^{-\lambda D_G} \xi(0)\), the observable \(C\) evolves according to \[ d_\lambda C = \left[{C}, {G}\right]_\xi = -D_G C \] To be extremely clear about dependences, we write \(d_\lambda C(\xi(\lambda)) = -D_G \big|_{\xi(\lambda)} C(\xi(\lambda))\), here \(C, G\) are both implicit functions of \(\lambda\) through \(\xi(\lambda)\).

Proof: Indices which are repeated twice are summed over (contracted), expand to obtain \[\begin{align} d_\lambda C &= (\nabla_\xi C)\cdot d_\lambda \xi = -(\nabla_\xi C)\cdot D_G \xi = (\partial_{\xi_j} C) [\xi_j, G] \\ &= (\partial_{\xi_j} C) \partial_{\xi_l} \xi_j \Gamma_{lk} (\partial_{\xi_k} G) = (\partial_{\xi_j} C)\Gamma_{jk}(\partial_{\xi_k} G) = \left[{C}, {G}\right]_\xi \end{align}\]

Corollary 4.1 (scalars are conserved under their Hamiltonian flow) Given a scalar function \(G\) on phase space, if \(\xi\) evolves according to the flow generated by \(G\) as below, then \(d_\lambda G(\xi(\lambda)) = 0\). \[ d_\lambda \xi = - D_G \xi \iff \xi(\lambda) = e^{- \lambda D_G} \xi(0) \]

Proof: By antisymmetry of the Poisson bracket, \(d_\lambda G = \left[{G}, {G}\right]_\xi = 0\).

Definition 4.5 (canonical transform) A map \(\xi\mapsto \varphi(\xi)\) is a canonical transform if it is a symplectomorphism, i.e. it preserves the canonical form \(\omega = \sum_{j=1}^n dq_j\wedge dp_j\) under the pullback map \[ \varphi^*\, \omega = \omega \] In components, this is equivalent to \(J\Gamma J^T = \Gamma\), where \(J\) is the Jacobian of the transformation \(\varphi\).

Theorem 4.3 (Liouville's theorem) A canonical transform is a probability-preserving transformation (i.e. preserves phase space volume). Equivalently, the Jacobian \(J\) satisfies \(\det J=1\) in the \(\xi\) basis.

Proof: The \(n\)-th exterior product \(\omega^n\) is the volume form.

We will state, instead of prove, the following result; the most concise proof constitutes computing the Lie derivative of \(\omega\) under \(D_A\) using Cartan’s formula.

Theorem 4.4 Given a scalar function \(A\) and the transform induced by its symplectic gradient \(e^{\lambda D_A}\) then for \(J=J_{\xi \to e^{\lambda D_A}\xi}\), we obtain \[ J\Gamma J^T = \Gamma, \quad \det J = 1 \] In other words, every transform of phase space generated by Hamiltonian flow is a canonical transform.