Disclaimer: This post was translated into English by an AI model. It may contain mistakes or awkward wording.

Not long ago I accidentally dug up a course assignment I wrote last year for a class on frontiers of pure mathematics. I had just finished reading Quantum Mechanics: The Theoretical Minimum, so I wrote about elementary quantum mechanics. Because it was rushed, it is not very detailed; for example, it does not even mention density matrices. Its main emphasis is on eigenvalues. Perhaps it is useful as a simple summary or introduction to quantum mechanics, so I rewrote the LaTeX into Markdown and posted it here. It does feel a bit like filler.

The title is deliberately written this way, to emphasize that I know absolutely nothing about quantum mechanics.

Eigenvalue Problems in Quantum Mechanics

Where is the dividing line between quantum and classical? There are two most important differences.

Observation affects state. In a classical system, when we observe a physical quantity, hereafter an observable, we usually assume that the effect of observation on the observed system is small enough to ignore. A quantum system is extremely small, so observing it has a non-negligible effect. This leads to the distinction in quantum mechanics between state and observation result.

The logical foundation is different. In classical mechanics, for the proposition A and B, we can separately verify the truth values of A and B, and then infer the truth value of A and B through the meaning of "and". In quantum mechanics, because observation affects state, if observing A can affect the truth value of B, then A and B is no longer equivalent to B and A. This means quantum mechanics uses a completely different logical foundation.

For simplicity, the following mainly uses the simplest discrete observable, spin, or a qubit, as an example, and then generalizes to continuous observables.

State

The simplest quantum system is a single-spin system. In such a system, we can prepare the observable \(\sigma\) in a particular direction, so that measuring in that direction always gives \(\sigma = 1\). Since \(\sigma\) is a 3-vector with components in three directions, we can regard spin in the up-down, left-right, and in-out directions as three observables. Suppose the direction we prepare is upward, written \(\sigma_u = 1\). Conversely, measuring in the downward direction will always give -1, written \(\sigma_d = -1\).

What about measuring in the left-right or in-out directions? Very unlike classical physics, measuring \(\sigma_l\) in the left direction or \(\sigma_i\) inward produces random values -1 and +1. But if we measure many times, either by preparing the system in the previous state each time or by measuring many systems in the same state, we find that -1 and +1 occur with roughly equal probability.

For this reason we introduce the concept of a state vector: a state is represented by a column vector of complex numbers. The upward-spin system above can be written as \(|\psi\rangle = 1 |u\rangle + 0 |d\rangle\). Thus the state space is a two-dimensional vector space over complex numbers, with basis \(\lbrace|u\rangle, |d\rangle\rbrace\). Here \(|u\rangle\) means spin up, and \(|d\rangle\) means spin down.

A general state vector can be written as \(|\psi\rangle = \alpha_u |u\rangle + \alpha_d |d\rangle\). If both \(\alpha_d\) and \(\alpha_u\) are nonzero, it is called a superposition. What constraints do the coefficients satisfy? Physical experiments show that \(\alpha_u^\alpha_u\) is the probability of measuring +1 in the upward direction. If we observe in the upward direction, we either measure +1 or -1, so probabilities must normalize: \(P_u + P_d = 1\), where \(P_u = \alpha_u^\alpha_u\), \(P_d = \alpha_d^\alpha_d\), and \(x^\) denotes the complex conjugate of \(x\). In other words, \(\langle\psi|\psi\rangle = 1\).

Observables

As seen above, although the spin points up or down, we can still measure it in the left-right and in-out directions; the results are merely random. From the probability \(\frac{1}{2}\) of measuring +1, together with normalization and orthogonality, we can calculate:

$$ |r\rangle = \frac{1}{\sqrt 2}|u\rangle + \frac{1}{\sqrt 2}|d\rangle $$ $$ |l\rangle = \frac{1}{\sqrt 2}|u\rangle - \frac{1}{\sqrt 2}|d\rangle $$ $$ |i\rangle = \frac{1}{\sqrt 2}|u\rangle + \frac{i}{\sqrt 2}|d\rangle $$ $$ |o\rangle = \frac{1}{\sqrt 2}|u\rangle - \frac{i}{\sqrt 2}|d\rangle $$

This shows that \(|u\rangle, |d\rangle\) form a complete basis. For convenience, we want a method that packages these two things together. The mathematical tool we find is the matrix: clearly, we can choose a \(2 \times 2\) matrix whose eigenvectors form an orthogonal complete system.

We want the matrix, written \(\sigma_z\), to have eigenvectors exactly \(\vert u\rangle\) and \(\vert d\rangle\), while the eigenvalues themselves express measurement results:

$$ \begin{align*} \sigma_z |u\rangle &= |u\rangle \\ \sigma_z |d\rangle &= -|d\rangle \end{align*} $$

If we choose the concrete representation \(\vert u\rangle = (1, 0)^T, \vert d\rangle=(0,1)^T\), then it is easy to compute:

$$ \begin{align*} \sigma_z =& \left( \begin{array}{cc} 1&0\\ 0&-1 \end{array} \right)\\ \sigma_x =& \left( \begin{array}{cc} 0&1\\ 1&0 \end{array} \right)\\ \sigma_y =& \left( \begin{array}{cc} 0&-i\\ i&0 \end{array} \right) \end{align*} $$

These are the famous Pauli matrices.

In physics, an observation result \(r\) must be real, i.e. \(r^* = r\). Quantum mechanics is no exception. An observable is represented by a linear operator \(\bf{L}\), and this operator must satisfy \(\bf{L}^\dagger = \bf{L}\). That is, \(\bf{L}\) must be Hermitian. It is easy to prove that the eigenvalues of Hermitian operators are real, matching our expectation.

From here we can see that linear algebra plays a central role in quantum mechanics: almost every physical quantity is represented as a linear operator, and we discuss its eigenvalues and eigenvectors.

Composite Systems and Quantum Entanglement

One major difference from classical systems is that composite quantum systems can exhibit entanglement.

For example, two spin systems A and B can be placed together to form a composite system. Their state spaces are \(S_A\) and \(S_B\), so the state space of the composite system AB is \(S_{AB} = S_A\otimes S_B\), where \(\otimes\) denotes the tensor product.

If the states \(\vert A\rangle\) and \(\vert B\rangle\) of A and B are both known, then they commute and can be specified and known simultaneously; see below. It is easy to see that:

$$\vert AB\rangle = \vert A\rangle \otimes \vert B\rangle = \alpha_u\beta_u\vert uu\rangle + \alpha_u\beta_d\vert ud\rangle + \alpha_d\beta_u\vert du\rangle + \alpha_d\beta_d \vert dd\rangle$$

A state that can be decomposed into a product of two states is called a product state.

But in general, some states cannot be decomposed this way, such as \(\vert sing\rangle = \frac{1}{\sqrt 2}(\vert ud\rangle - \vert du\rangle)\). Let us examine this state. For convenience, write A's spin operator as \(\sigma\), and B's spin operator as \(\tau\).

Clearly, \(\tau\) does not affect \(\vert A\rangle\), and \(\sigma\) does not affect \(\vert B\rangle\). Computing the expectation of \(\sigma_z\), we get \(\langle \sigma_z \rangle = \langle sing \vert \sigma_z \vert sing \rangle =0\). Similarly, \(\langle \sigma_x \rangle = \langle \sigma_y \rangle=\langle \tau_x \rangle= \langle\tau_y\rangle=\langle\tau_z\rangle = 0\). This means that in this state, even if we know the state of the whole system, we cannot predict the measurement result of either subsystem.

Conversely, the composite observable \(\tau_z \sigma_z\) must have observation value -1, because \(\tau_z\sigma_z\vert sing\rangle = - \vert sing\rangle\). This means that if A measures \(\sigma_z = +1\), then B must measure \(\tau_z=-1\).

Neither fact alone may be surprising, but together they show something striking: for the composite state \(\vert sing\rangle\), we know nothing about the subsystems individually, yet we can find relationships between their observation results. A state like \(\vert sing\rangle\), where knowing one observation result lets us know the other deterministically, is called a maximally entangled state.

From Discrete to Continuous: Hilbert Space

To study motion, we need to study observables with continuous values, such as position and momentum.

First introduce the wave function. If a discrete state \(\vert \psi\rangle = \sum_i \lambda_i\vert \lambda_i\rangle\) is known, define the wave function by:

$$ \psi(\vert \lambda_i\rangle) = \lambda_i $$

that is, the coefficient of the state \(\vert \lambda_i\rangle\). From the physical meaning of the coefficient:

$$ P(\lambda_i) = \psi(\vert \lambda_i\rangle)^*\psi(\vert \lambda_i\rangle) $$

By normalization:

$$ \sum_i \psi(\vert \lambda_i\rangle)^*\psi(\vert \lambda_i\rangle) = 1 $$

It is easy to see that wave functions correspond one-to-one with state vectors.

Because functions naturally extend to the continuous case, first preserve normalization:

$$ \int_{-\infty}^{+\infty} \psi(x)^*\psi(x) \mathrm{d}x = 1 $$

For continuous states, consider the inner product in Hilbert space:

$$ \langle\Psi\vert \Phi\rangle = \int_{-\infty}^{+\infty} \psi^*(x) \phi(x) \mathrm{d}x $$

Thus we transform ordinary discrete vectors into special continuous vectors in Hilbert space, namely wave functions, and naturally move from the discrete world to the continuous world.

Two natural questions arise:

  • What are linear operators in Hilbert space like?
  • Under what conditions can these linear operators be called Hermitian?

The first question is simple: anything satisfying the axioms is a linear operator. Two of the simplest operators are \(\bf X\) and \(D\):

$$ {\bf X} \psi(x) = x\psi(x) $$

$$ {\bf D} \psi(x) = \frac{\mathrm{d} \psi(x)}{\mathrm{d} x} $$

It is easy to verify that both are linear. Now consider their Hermitian properties. Generalize Hermitian-ness to:

$$ \langle\Psi\vert {\bf L}\vert \Phi\rangle = \langle\Phi\vert {\bf L}\vert \Psi\rangle^* $$

Verification shows that \(\bf X\) is Hermitian, while \(\bf D\) is anti-Hermitian. Therefore we multiply \(\bf D\) by \(-i\hbar\) to make it Hermitian, and call the result \(\bf P\).

Now examine the eigenvalues and eigenvectors of these two operators, \({\bf X}\) and \({\bf P}\).

$$ {\bf X} \psi(x) = x\psi(x) = x_0\psi(x) $$

$$ (x-x_0)\psi(x) = 0 $$

Thus \(\bf X\) has infinitely many eigenvalues \(x_0\). Considering normalization, the eigenvector corresponding to \(x_0\) is \(\psi(x) = \delta(x-x_0)\). The physical meaning is clear: the probability of the particle being at \(x \neq x_0\) is 0, so the particle can only be at \(x=x_0\). This corresponds to position, as expected.

$$ {\bf P} \psi(x) = -i\hbar \frac{\mathrm{d} \psi(x)}{\mathrm{d} x} = p\psi(x) $$ $$ \psi_p(x) = \frac{1}{\sqrt {2\pi}} e^{\frac{ipx}{\hbar}} $$

Thus \(\bf P\) has infinitely many eigenvalues \(p\), with corresponding eigenvectors \(\psi_p(x)\). This operator corresponds to momentum. Note that the vector is expressed in \(x\)-space. Here we can see the shadow of waves, which is why the wave function is called a wave function: \(\psi_p(x+\frac{2\pi\hbar}{p})=\psi_p(x)\).

To change the basis of \(\psi_p(x)\) to \(p\), i.e. to transform it into a function whose variable is \(p\), note:

$$ \psi(x) = \langle x \vert \Psi\rangle $$

$$ \langle p\vert x \rangle = \frac{1}{\sqrt {2\pi}} e^{-\frac{ipx}{\hbar}} $$

$$ \tilde \psi(p) = \langle p\vert \Psi\rangle = \int \mathrm{d}x \langle p\vert x \rangle \langle x\vert \Psi \rangle=\frac{1}{\sqrt {2\pi}}\int \mathrm{d}x e^{-\frac{ipx}{\hbar}} \psi(x) $$

This is exactly the Fourier transform of \(\psi(x)\). We get:

$$ \tilde \psi(p) =\frac{1}{\sqrt {2\pi}}\int \mathrm{d}x e^{\frac{-ipx}{\hbar}} \psi(x) $$ $$ \psi(p) = \frac{1}{\sqrt {2\pi}}\int \mathrm{d}x e^{\frac{ipx}{\hbar}} \tilde\psi(x) $$

Time and Change

First consider how state changes with time. Suppose:

$$ \vert \psi(t)\rangle = {\bf U}(t)\vert \psi(0)\rangle $$

This represents deterministic spontaneous evolution of the system without external influence. If \(\vert \Psi\rangle\) and \(\Phi\rangle\) are distinguishable states, they remain orthogonal forever:

$$\langle\Psi(t)\vert \Phi(t)\rangle = 0$$

$$\langle\Phi(t)\vert \Psi(t)\rangle = 0$$

Expanding:

$$\langle\Psi(0)\vert {\bf U}(t)^\dagger {\bf U}(t)\vert \Phi(0)\rangle = 0$$

$$\langle\Phi(0)\vert {\bf U}(t)^\dagger {\bf U}(t)\vert \Psi(0)\rangle = 0$$

Thus:

$$ {\bf U}(t)^\dagger {\bf U}(t) = I$$

This means state-vector evolution over time is unitary.

We regard this evolution as continuous:

$$ {\bf U}(\epsilon) = I - i \epsilon \bf H $$

From the unitarity of \(\bf U\), we get:

$$ {\bf H}^\dagger = \bf H $$

So \(\bf H\) is Hermitian. The physical quantity represented by \({\bf H}\) is called the generalized Hamiltonian, representing the total energy of the system.

Substituting \({\bf U}(\epsilon) = I - i \epsilon \bf H\) into the first equation of this section gives:

$$ \vert \Psi(\epsilon)\rangle = \vert \Psi(0)\rangle - i\epsilon \bf H $$ $$ \frac{\vert \Psi(\epsilon)\rangle - \vert \Psi(0)\rangle}{\epsilon} = -i \bf H $$ $$ \frac{\partial \vert \Psi(t)\rangle}{\partial t} = -i \bf H $$

Call the last equation the generalized Schrödinger equation. Its dimensions are not correct, since \(\bf H\) represents energy, so we correct it to:

$$ \hbar \frac{\partial \vert \Psi(t)\rangle}{\partial t} = -i \bf H $$

Since the expectation of a quantum observable usually has a direct classical counterpart, we want to know how the expectation of an observable changes with time:

$$ \begin{align*} &\frac{\mathrm d}{\mathrm dt} \langle {\bf L} \rangle\\ =& \frac{\mathrm d}{\mathrm dt} \langle \Psi(t) \vert {\bf L} \vert \Psi(t) \rangle\\ =& \frac{i}{\hbar} \langle \Psi(t) \vert [{\bf HL - LH}] \vert \Psi(t) \rangle\\ =& \frac{i}{\hbar} \langle \Psi(t) \vert {\bf [H,L]} \vert \Psi(t) \rangle \end{align*} $$

That is:

$$ \frac{\mathrm d}{\mathrm d t} \langle {\bf L} \rangle = -\frac{i}{\hbar} \langle {\bf [L,H]} \rangle $$

Here \(\bf [H,L] = HL-LH\) is the commutator of L with respect to H. It relates the time variation of the expectation of observable \(\bf L\) to the expectation of another observable. If a quantity \(\bf Q\) has commutator \({\bf [H,Q]} = 0\) with \(\bf H\), then \(\bf Q\) does not change with time; that is, \(\bf Q\) is conserved.

The Uncertainty Relation

Suppose we want to observe two quantities \(\bf L, M\) simultaneously. Consider their common eigenvectors:

$$ {\bf L} \vert \lambda_i, \mu_a\rangle = \lambda_i \vert \lambda_i, \mu_a\rangle\\ {\bf M} \vert \lambda_i, \mu_a\rangle = \mu_a \vert \lambda_i, \mu_a\rangle $$

Then:

$$ {\bf LM} \vert \lambda_i, \mu_a\rangle = \lambda_i \mu_a \vert \lambda_i, \mu_a\rangle\\ {\bf ML} \vert \lambda_i, \mu_a\rangle = \mu_a \lambda_i \vert \lambda_i, \mu_a\rangle $$

Therefore:

$$ {\bf [L,M]} \vert \lambda_i, \mu_a\rangle = 0 $$

Since the common eigenvectors of \(\bf L, M\) are complete, this forces \({\bf [L,M]} = 0\). In other words, if we want to measure two quantities simultaneously, they must commute; otherwise they cannot be measured simultaneously.

A natural question is: if we cannot measure them simultaneously, to what extent can we not measure them simultaneously? How do we quantify the error? First define:

$$ \bar{\mathbf{A}} = {\bf A} - \langle {\bf A} \rangle I $$

The variance of A is:

$$ (\Delta \mathbf{A})^2 = \langle \Psi \vert \bar{\mathbf{A}}^2 \vert \Psi \rangle $$

In any vector space, the Cauchy-Schwarz inequality gives:

$$ 2\vert X\vert \vert Y\vert \geq \vert \langle X\vert Y \rangle + \langle Y\vert X \rangle\vert $$

Let:

$$ \vert X\rangle = \mathbf{A}\vert \Psi\rangle\ \vert Y\rangle = i\mathbf{B}\vert \Psi\rangle $$

Substituting into Cauchy-Schwarz:

$$ 2\sqrt{\langle\mathbf{A}^2\rangle \langle\mathbf{B}^2\rangle} \geq \vert \langle \Psi\vert [\mathbf{A,B}]\vert \Psi \rangle\vert $$

Thus:

$$ \Delta\mathbf{A}\Delta\mathbf{B} \geq \frac{1}{2} \vert \langle \Psi\vert [\mathbf{A,B}]\vert \Psi \rangle\vert $$

This is the most general uncertainty principle. It means that as long as quantities \(\bf A, B\) do not commute, we cannot possess unambiguous knowledge of both at the same time.

From the inequality above, we can immediately derive Heisenberg's position-momentum uncertainty relation.

First compute \(\bf [X,P]\):

$$ \mathbf{[X,P]}\psi(x) = \mathbf{XP}\psi(x) - \mathbf{PX}\psi(x) = i\hbar \psi(x) $$

Thus \({\bf [X,P]} = i\hbar\). Substituting into the general uncertainty principle gives:

$$ \Delta\mathbf{X} \Delta\mathbf{P} \geq \frac{\hbar}{2} $$

This is the famous Heisenberg uncertainty relation.