Classical Information Theory

Comprehensive course notes for MIT 6.370 Information Theory: from Coding to Learning, (Fall 2024, Yury Polyanskiy) This is a very fast-paced, graduate-level treatment of modern information theory.

Highlights of the course:

  1. Fundamental properties of information measures (e.g. entropy, divergence, mutual information, Fisher information): non-negativity, monotonicity, and data-processing inequality (convexity).
  2. Divergence as a fundamental concept, f-divergences.
  3. Convexity of information measures correpond to important variational characterizations, which yield tractable bounds and approximations.

Nontrivial theorems:

  1. Golden formula: variational characterization of mutual information.
  2. Donsker-Varadhan, and generalized variational characterization of f-divergences.
  3. Local \(\chi^2\) behavior of f-divergences, and Fisher behavior of regular parameterized families.
  4. Harremoës-Vadja: one theorem to rule them all f-divergence joint range.
  5. Hammersley-Chapman-Robbins (HCR) Cramér-Rao lower bound; van Trees inequality.