2 Lagrange’s Equations

In this section we develop two perspectives on Lagrange’s equations:

A “covariant” formulation of Newton Cartesian equations.
Equation of the trajectory satisfying the principle of least action.

We note in the subfinal section that the two perspectives are related: the Euler-Lagrange vector \(\mathbb L_q\) is (negative) the representing element of the directional derivative map \(\eta \mapsto (D_\eta \mathcal S)(q)\), which is a linear functional in the endpoint-preserving perturbation \(\eta\).

Both perspectives are extremely important to the development of modern theoretical physics. Computationally, make sure to understand lemma 2.5, which appears again in the proof of Noether’s theorem.

Covariant formulation

Covariance: introduction

What do we mean by something covariant?

Consider a function \(f:\mathbb R^2\to \mathbb R\) defined on the 2D plane according to \[ f_1(x, y) = x^2+y^2 \] This function is, in a sense, “equivalent” to the definition \[ f_2(r, \theta) = r^2 \] Although \(f_1, f_2\) are defined on different domains, they represent the same function \(f\) defined on the \(2D\) plane: the choice of \((x, y)\) versus \((r, \theta)\) are simply different choices of “charts” to represent the same underlying geometric object (manifold). When we specify \(f(r, \theta)=r^2\) and understand \((r, \theta)\) as a chart for \(\mathbb R^2\), we have equivalently defined \(f(x, y)=x^2+y^2\). In this sense, \(f\) is covariant.

Remark. Covariant objects are, in a sense, analogous to “platonic ideas.” Another prime example of a covariant object is an abstract linear transformation (e.g. rotation): in this case the matrix representations are obtained after fixing a chart (basis).

Remark. If you want to dig deeper, study differential geometry!

Euler-Lagrange Equations

Let \((q_j)=\{q_1, \cdots, q_N\}\) denote the degrees of freedom. A trajectory’s “snapshot” at a single time is completely captured by \(((q_j), (\dot q_j))\). A Lagrangian is a scalar (function) which looks at such snapshots and evaluates to a number.

Definition 2.1 (Lagrangian) The Lagrangian of a physical system is a map of type \[ \mathcal L: \mathbb R^n\times \mathbb R^n \times \mathbb R\to \mathbb R \] Its arguments are abbreviated \(\mathcal L(q, \dot q, t)\) but expands to \[ \mathcal L(q_1, \cdots, q_n, \dot q_1, \cdots, \dot q_n, t) \]

To be precise, the Lagrangian is a function on the tangent bundle \(TM\) of the configuration manifold \(TM\).

Definition 2.2 (Euler-Lagrange Equations) According to the EL-equations, a trajectory \(q(t)\) is physical if \[ 0 = \mathbb L_q(t) = (d_t \nabla_{\dot q} - \nabla_q)\mathcal L(q, \dot q, t) \in \mathbb R^n \] for all \(t\). The notation above unpacks to \(\forall j=1, \cdots, n\). \[ d_t (\partial_{\dot q_j} \mathcal L) = \partial_{q_j} \mathcal L \] the time-dependent Euler-Lagrange vector \(\mathbb L_q\) captures the deviation from \(0\); we use the subscript to denote explicit dependence on the coordinates \(q\).

In Cartesian coordinates, define the Newtonian Lagrangian to be \[ \mathcal L(x, \dot x, t) = T - V = \dfrac m 2 \dot q^2 - V(q, t) \] Then Lagrange’s equations (definition 2.2) simplify exactly to Newton’s equations. \[ m\ddot x = \nabla_x V \tag{2.1} \]

Covariance of Euler-Lagrange Equations

The following theorem establishes that the following statements are equivalent:

Newton’s equations (2.1) are true in Cartesian coordinates
Lagrange’s equations are true in arbitrary invertible coordinates.

Theorem 2.1 (Covariance of Euler-Lagrange equations) Given a coordinate transform \(x\to q\), the EL-vector transforms covariantly according to \[ \mathbb L_q(t) = J_{q\to x} \mathbb L_x(t) \]

To prove this, we need to establish several lemmas. We begin by looking at how the change of coordinates \(q\to x\) determines \((q, \dot q)\to (x, \dot x)\).

Lemma 2.1 (dot cancellation) \(J_{\dot q\to \dot x} = J_{q\to x}\).

Proof: Follows from chain rule: \[\begin{align} \dot x_j &= d_t x_j(q_1, \cdots, q_j) = (\partial_{q_k} x_j) \dot q_k \implies \dot x = J_{q\to x} \dot q \end{align}\] Now, \(J_{q\to x}\) is independent of \(\dot q\), so \(J_{\dot q\to \dot x} = J_{q\to x}\).

Lemma 2.2 \(J_{q\to \dot x} = d_t J_{q\to x}\)

Proof: On the LHS, \(\partial_{q_i} \dot x_j = \partial_{q_i} (\dot q_k \partial_{q_k} x_j) = \dot q_k (\partial_{q_iq_k}^2 x_j)\). On the RHS, \((d_t J_{q\to x}(q_1, \cdots, q_n))_{ij} = (\partial_{q_i q_k}^2 x_j) \dot q_k\).

The next two lemmas essentially restate the chain rule for \(q(x)\) and \(\dot q(x, \dot x)\).

Lemma 2.3 \(\nabla_{\dot x} = J_{\dot q\to \dot x} \nabla_{\dot q}\).

Lemma 2.4 \(\nabla_x = J_{\dot q\to x} \nabla_q + J_{q\to x}\nabla_x\).

We are now ready to prove theorem 2.1 by expanding the operator \(d_t\nabla_{\dot x} - \nabla_x\): \[\begin{align} d_t \nabla_{\dot x} - \nabla_x &= d_t (J_{\dot q\to \dot x} \nabla_{\dot q}) - J_{q\to x}\nabla_q - J_{x\to \dot q}\nabla_{\dot q} \\ &= (d_t J_{\dot q\to \dot x})\nabla_{\dot q} + J_{\dot q\to \dot x} d_t\nabla_{\dot q} - J_{q\to x}\nabla_q - J_{q\to \dot x}\nabla_{\dot q} \\ &= (d_tJ_{x\to q})\nabla_{\dot q} + J_{q\to x} d_t\nabla_{\dot q} - J_{q\to x}\nabla_q - J_{q\to \dot x}\nabla_{\dot q} \\ &= J_{q\to \dot x}\nabla_{\dot q} + J_{q\to x} d_t\nabla_{\dot q} - J_{q\to x}\nabla_q - J_{q\to \dot x}\nabla_{\dot q} \\ &= J_{q\to x} d_t\nabla_{\dot q} - J_{q\to x}\nabla_q \\ &= J_{q\to x}(d_t \nabla_{\dot q} - \nabla_q) \end{align}\] We applied lemmas 2.3 and 2.4 on line \(1\), applied the product rule to \(J_{\dot x\to \dot q}\nabla_{\dot q}\) to obtain line \(2\) line \(2\), then cancelled the dots by lemma 2.1 to obtain line \(3\). Finally, we applied lemma 2.2 to line \(3\), cancelled terms, and regrouped.

Constraints

Sometimes it is convenient to consider an over-parameterized change of coordinates \(x\to q\). In this case, \(J_{q\to x}\) is “lean and tall”, while \(J_{x\to q}\) “short and wide.” Recall the covariant formula \[\begin{align} \mathbb L_x &= J_{x\to q} \mathbb L_q \end{align}\] When \(J_{x\to q}\) is full rank, the physicality condition \[ \mathbb L_x = 0 \iff \mathbb L_q=0 \] When \(q\) is overparameterized, however, requiring \(\mathbb L_q=0\) is too strong: the correct condition is \[ \mathbb L_x = 0 \iff \mathbb L_q\in \ker J_{x\to q} \]

Consider a single constraint enforced by some constraint function \(f(q)=0\) of signature \(f:\mathbb R^n\to \mathbb R\).

Proposition 2.1 Given a constraint \(f:\mathbb R^N\to \mathbb R\) and a trajectory \(q(t)\), if \(f(q(t))=0\) for all points on the trajectory, then \[ \nabla f \in \ker J_{x\to q} \] To be precise, the gradient \((\nabla f)(q(t))\) evaluated at all points \(q(t)\) for all \(t\) is in the kernel of \(J_{x\to q}(q(t))\) evaluated at \(q(t)\).

Proof: Recognizing the chain rule: \(J_{x\to q} \nabla\big|_q f = J_{x\to f}=0\) since \(f\) is constant.

Definition 2.3 (complete, independent constraints) Given an over-parameterized system with configuration in \(\mathbb R^{m>n}\), where there are only \(n\) true degrees of freedom, a set of constraints \(\{f_j:\mathbb R^m\to \mathbb R\}_{j=1}^k\) is independent if for all possible configurations \(q\in \mathbb R^m\), the set of vectors \[ C(q) = \{\nabla f_j\big|_q\}_{j=1}^k \] are linearly independent. They are complete if \(C(q)\) spans \(\ker J_{x\to q}(q)\) for all points \(q\).

All of this is a fancy way of saying that \(f\) is complete and independent iff they provide necessary and sufficient information to discern physicality \(\mathbb L_q\in \ker J_{x\to q}\). Given these constraints, \(\mathbb L_x=0\) iff \[\begin{align} \mathbb L_q(q, \dot q) &= \sum_j \lambda_j \nabla_q f_j(q) \in \ker J_{x\to q}, \quad \lambda_j \in \mathbb R\\ f_j(q) &= 0 \end{align}\] This constrained problem can be solved by the standard Lagrange multiplier method.

Extremization formulation

In this subsection, we show that the Euler-Lagrange equations are fulfilled if and only if a certain action functional is extremized. This yields a powerful variational perspective on physicality.

Definition 2.4 (action functional) Given a Lagrangian \(\mathcal L(q, \dot q, t)\) (definition 2.1), its associated action functional \(\mathcal S_{[a, b]}(p)\) takes in \(3\) arguments:

Begining time \(a\in \mathbb R\).
End time \(b\in \mathbb R\).
A path \(q:[a, b]\to \mathbb R^n\).

and outputs the scalar \[ \mathcal S_{[a, b]}(p) = \int_a^b \mathcal L(q(t), \dot q(t), t)\, dt \]

To properly define extremization, we need the directional derivative. Fixing \(a, b\), the directional derivative \((D_\eta \mathcal S_{[a, b]})(q)\) tells us how much \(\mathcal S(q)\) changes if we nudge \(q\) in the direction of \(\eta\) infinitesimally.

Definition 2.5 (directional derivative of action) Fixing \(a, b\), the directional derivative \((D_\eta \mathcal S_{[a, b]})(q)\) takes in two arguments:

The perturbing path \(\eta: [a, b]\to \mathbb R^n\).
The path at which the perturbation is evaluated \(q:[a, b]\to \mathbb R^n\).

and outputs the scalar slope of \(\mathcal S_{[a, b]}(q+\epsilon \eta)\) w.r.t \(\epsilon\), evaluated at \(0\): \[ (D_\eta \mathcal S_{[a, b]})(q) = \lim_{\epsilon \to 0} \dfrac{\mathcal S_{[a, b]}(q) + \mathcal S_{[a, b]}(q+\epsilon \eta)}{\epsilon} \] To reduce notation clutter, we abbreviate this as \[ D_\eta S(q) = d_\epsilon \big|_0 S(q+\epsilon \eta) \]

The magic in the air is that the Euler-Lagrange vector (definition 2.2) \(\mathbb L_q\) gives us the directional derivative when the input perturbation is endpoint-preserving.

Theorem 2.2 (endpoint-preserving derivative) For endpoint-preserving \(\eta:[a, b]\to \mathbb R\) such that \(\eta(a)=\eta(b)=0\), the directional derivative of \(\mathcal S\) is given by \[ (D_\eta \mathcal S)(q) = -\int_a^b \mathbb L_q(t) \eta(t)\, dt \]

Proof: Fixing \(q, \eta\), for any \(\epsilon>0\) let \(q_\epsilon:[a, b]\to \mathbb R^n\) denote the family of paths given by \[ q_\epsilon(t) = q(t) + \epsilon \eta(t) \tag{2.2} \] Slide \(d_\epsilon\) into the integral and apply lemma 2.5, then apply the endpoint-vanishing condition \[\begin{align} (D_\eta \mathcal S)(q) &= d_\epsilon \big|_{\epsilon=0} \mathcal S(q+\epsilon \eta) = \int_a^b d_\epsilon \mathcal L(q_\epsilon, \dot q_\epsilon, t)\, dt \\ &= \int_a^b \left[ -\eta \cdot \mathbb L_q + d_t \left(\eta \cdot \nabla_{\dot q}\mathcal L\right) \right]\, dt \\ &= \left(\eta \cdot \nabla_{\dot q}\mathcal L\right)\big|^b_a -\int_a^b \eta \cdot \mathbb L_q\, dt = -\int_a^b \eta \cdot \mathbb L_q\, dt \end{align}\]

The following lemma is the key component in the calculus of variations. Make sure to read it carefully.

Lemma 2.5 (ε-derivative of the perturbed Lagrangian) With \(q_\epsilon=q + \epsilon \eta\) as in (2.2) \[ d_\epsilon \mathcal L(q_\epsilon, \dot q_\epsilon) = -\eta \cdot \mathbb L_q + d_t \left(\eta \cdot \nabla_{\dot q}\mathcal L\right) \]

Proof: Using the dependence \(\epsilon \to (q_\epsilon, \dot q_\epsilon) \to \mathcal L\), apply the chain rule \[\begin{align} d_\epsilon \mathcal L(q_\epsilon, \dot q_\epsilon) &= (d_\epsilon q_\epsilon) \cdot (\nabla_q \mathcal L) + (d_\epsilon \dot q_\epsilon) \cdot (\nabla_{\dot q} \mathcal L) \\ &= \eta \cdot \nabla_q \mathcal L + d_t \eta \cdot \nabla_{\dot q} \mathcal L \\ &= \eta \cdot \nabla_q \mathcal L + d_t \left(\eta \cdot \nabla_{\dot q}\mathcal L \right) - \eta \cdot d_t \nabla_{\dot q}\mathcal L \\ &= \eta \cdot (\nabla_q - d_t \nabla_{\dot q})\mathcal L + d_t \left(\eta \cdot \nabla_{\dot q} \mathcal L\right)\\ &= -\eta \cdot \mathbb L_q + d_t \left(\eta \cdot \nabla_{\dot q}\mathcal L\right) \end{align}\] On the second line, we used integration by parts on the time-derivative to transfer the time-derivative on \(\eta\) (which we don’t like) to \(\nabla_{\dot q}\mathcal L\) (which we like, since it appears in \(\mathbb L_q\)).

It is a standard result in real analysis (fundamental lemma of the calculus of variations) that for \[ (D_\eta \mathcal S)(q) = -\int_a^b \mathbb L_q(t) \eta(t)\, dt \] to be \(0\) for all reasonable endpoint-preserving perturbations \(\eta\) (which includes, in particular, compactly supported smoth functions), \(\mathbb L_q(t)\) must be \(0\). This allows us to prove another equivalence result.

Theorem 2.3 (Hamilton's principle) Recall that, given a Lagrangian \(\mathcal L(q, \dot q, \epsilon)\), we call a path \(q:[a, b]\to \mathbb R^n\) physical iff \[ \mathbb L_q(t)=0, \quad t\in [a, b], \quad \mathbb L_q(t) = (d_t \nabla_{\dot q} - \nabla_q) \mathcal L(q(t), \dot q(t), t) \] Then a path \(q\) is physical iff it extremizes the action functional at \(\mathcal S_{[a, b]}(q)\) (definition 2.4) w.r.t all endpoint-preserving perturbations \(\eta:[a, b]\to \mathbb R^n, \eta(a)=\eta(b)=0\) \[ (D_\eta \mathcal S)(q) = 0 \]

Representation perspective

In this section we give some intuition to the equation \[ (D_\eta \mathcal S)(q) = -\langle\mathbb L_q, \eta\rangle, \quad \langle f, g\rangle= \int_a^b f(t)g(t)\, dt \tag{2.3} \]

To motivate this, consider a finite-dimensional real vector space \(V\) equipped with an inner product \(\langle a, b\rangle\). A (real) linear functional \(\alpha\) on the space of \(V\) is a function of the signature \(\alpha:V\to \mathbb R\) satisfying linearity: \[ \alpha(u + c v) = \alpha(u) + c\alpha(v), \quad \forall u, v\in V, \quad c\in \mathbb R \] Some thought shows that every linear functional is uniquely specified by another vector \(a\in V\) according to \(\alpha(v) = \langle a, v\rangle\). Fix any basis \(\{b_1, \cdots b_n\}\), given \(\alpha:V\to \mathbb R\) we let \[ a = \sum_{j=1}^n \alpha(b_j) b_j \implies \langle a, \sum_{j=1}^N c_j b_j\rangle= \sum c_j \alpha(b_j) = \alpha\left( \sum_{j=1}^N c_j b_j \right) \] Fixing an interval \(a, b\) and classical trajectory \(q\) on \(a, b\), note the following facts:

the set of all endpoint-preserving perturbations \(\eta:[a, b]\to \mathbb R^n\) form a vector space \(V\).
The map \(\eta\mapsto (D_\eta \mathcal S)(q)\) is a linear functional w.r.t. \(\eta\): this is the main content of theorem 2.2. With regards to our previous discussion, theorem 2.2 equivalently states that we can regard \[ -\mathbb L_q = (\nabla_q - d_t \nabla_{\dot q})\mathcal L(q, \dot q) \] as the representation of the linear functional \(\eta \mapsto (D_\eta \mathcal S)(q)\), just as how \(a\in V\) represents \(\alpha:V\to \mathbb R\).

Separability of quadratic Lagrangians

The following result is also used in the path integral formulation of quantum mechanics.

Proposition 2.2 (Actions of quadratic Lagrangians are separable about the classical trajectory) Given a quadratic Lagrangian \[ \mathcal L(q, \dot q) = u\, \dot q^2 + v\, q^2 \] for path \(q\) wich satisfies the Euler-Lagrange equations, the Lagrangian satisfies \[ \mathcal L(q+\eta, \dot q + \dot \eta) = \mathcal L(q, \dot q) + \mathcal L(\eta, \dot \eta) = \mathcal L(q, \dot q) + \epsilon \mathcal L(\eta, \dot \eta) + d_t \left[\eta \cdot \nabla_{\dot q}\mathcal L(q, \dot q)\right] \] In particular, if the perturbation \(\eta:[a, b]\to \mathbb R^n\) is additionally endpoint preserving so \(\eta(a)=\eta(b)=0\), then \[ \mathcal S(q+\eta) = \mathcal S(q)+\mathcal S(\eta) \]

Proof: Consider the function in \(\epsilon:\mathbb R\to \mathbb R\) defined by \[\begin{align} \mathcal L(\epsilon) &= \mathcal L(q + \epsilon \eta, \dot q + \epsilon \dot \eta) = u (\dot q + \epsilon \dot \eta)^2 + v(q + \epsilon \eta)^2 \\ \end{align}\] It helps at this point to compute some of its derivatives \[\begin{align} \eta \cdot \nabla_q \mathcal L &= \eta \cdot 2v\epsilon (q + \epsilon \eta), \quad \eta \cdot \nabla_{\dot q} \mathcal L = \dot \eta \cdot 2u\epsilon (\dot q + \epsilon \dot \eta) \\ \dfrac 1 2 (\eta \cdot \nabla_q)^2 \mathcal L &= (\eta \cdot \nabla_q)\left[ \eta \cdot 2v\epsilon (q + \epsilon \eta) \right] = \epsilon v \, \eta^2 \\ \dfrac 1 2 (\eta \cdot \nabla_{\dot q})^2 \mathcal L &= (\dot \eta \cdot \nabla_{\dot q}) \left[\dot \eta \cdot 2u\epsilon (\dot q + \epsilon \dot \eta) \right] = \epsilon u \, \dot \eta^2 \\ [(\nabla \cdot \nabla_q)(\dot \eta \cdot \nabla_{\dot q})] \mathcal L &= [(\dot \eta \cdot \nabla_{\dot q})(\nabla \cdot \nabla_q)] \mathcal L = 0 \\ \dfrac 1 2 (\eta \cdot \nabla_q + \dot \eta \cdot \nabla_{\dot q})^2 \mathcal L(\epsilon) &= \epsilon \mathcal L(\eta, \dot \eta) \end{align}\] Taylor expanding about \(\epsilon=0\) yields \[\begin{align} \mathcal L(q + \epsilon \eta, \dot q + \epsilon \dot \eta) &= \mathcal L(q, \dot q) + \sum_{k=1}^\infty \dfrac{ (\eta \cdot \nabla_q + \dot \eta \cdot \nabla_{\dot q})^k } {k!} \mathcal L(q + \epsilon \eta, \dot q + \epsilon \dot \eta) \\ &= \mathcal L(0) + (\eta \cdot \nabla_q + \dot \eta \cdot \nabla_{\dot q}) \mathcal L + \dfrac 1 2 (\eta \cdot \nabla_q + \dot \eta \cdot \nabla_{\dot q})^2 \mathcal L \\ &= \mathcal L(q, \dot q) + \epsilon \mathcal L(\eta, \dot \eta) + (\eta \cdot \nabla_q + \dot \eta \cdot \nabla_{\dot q}) \mathcal L \tag{2.4} \end{align}\] We will now show that the extra last term (which came from the linear expansion) is zero when \(\eta\) is time-dependent. We do this by using the product rule and the condition that at \(\epsilon=0\), the path satisfies the Euler-Lagrange equation \[\begin{align} (\eta \cdot \nabla_q + \dot \eta \cdot \nabla_{\dot q})\mathcal L &= [\eta \cdot \nabla_q + d_t(\eta \cdot \nabla_{\dot q}) - \eta \cdot d_t \nabla_{\dot q}] \mathcal L \\ &= d_t(\eta \cdot \nabla_{\dot q})\mathcal L \end{align}\] Where we recognized \(\eta \cdot \nabla_q - \eta \cdot (d_t\nabla_{\dot q})\mathcal L = -\eta \cdot \mathbb L = 0\). Substituting back into equation (2.4) \[ \mathcal L(q + \epsilon \eta, \dot q + \epsilon \dot \eta) = \mathcal L(q, \dot q) + \epsilon \mathcal L(\eta, \dot \eta) + d_t \left[\eta \cdot \nabla_{\dot q}\mathcal L(q, \dot q)\right] \] Substitute \(\epsilon=1\) and integrate to obtain the action. The last total-derivative term vanishes by endpoint-preserving property of \(\eta\).