Information Theory for Machine Learning: Theorems, Proofs, and Python Implementations

Author:   Yehuda Setnik
Publisher:   Independently Published
ISBN:  

9798273722620


Pages:   374
Publication Date:   09 November 2025
Format:   Paperback
Availability:   Available To Order   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Our Price $211.17 Quantity:  
Add to Cart

Share |

Information Theory for Machine Learning: Theorems, Proofs, and Python Implementations


Overview

The complete graduate-level reference for entropy, divergence, and mutual information in modern machine learning, rigorously developed from measure theory to contemporary estimators and algorithms. Measure-theoretic foundations: sigma-algebras, Radon-Nikodym, conditional expectation, change of measure. Core measures: entropy, cross-entropy, KL, mutual information; f-divergences and Renyi divergences with variational dualities (Fenchel, Donsker-Varadhan). Data processing and fundamental inequalities: log-sum, Pinsker, Csiszar-Kullback-Pinsker, Fano, Le Cam, Assouad; equality conditions and sufficiency. Gaussian tools: entropy power inequality, de Bruijn identity, Fisher information, I-MMSE, Gaussian extremality. Maximum entropy and exponential families; log-partition convexity, Bregman geometry, Pythagorean theorems. Fisher information and asymptotics: score, Cramer-Rao bounds, LAN, Bernstein-von Mises, asymptotic efficiency. Information geometry and natural gradients: Fisher-Rao metric, dual connections, mirror descent. Source coding and MDL: Kraft-McMillan, NML, universal coding, compression-generalization links. Generalization: PAC-Bayes bounds, mutual information bounds I(W;S), stability of SGD. Concentration via information: DV method, log-Sobolev and Poincare inequalities, transportation T1/T2, hypercontractivity. Variational inference and divergence minimization: ELBO, alpha-divergences, EP, black-box VI with reparameterization. Estimating entropy and MI: plug-in, kNN, KDE, Kraskov, MINE, InfoNCE; minimax rates and consistency. Rate-distortion and information bottleneck: Blahut-Arimoto, optimal encoders, sufficiency-compression trade-offs. Contrastive representation learning under augmentations: alignment vs uniformity, identifiability, sample complexity. Generative modeling: VAEs, bits-back coding, beta-VAE, TCVAE; likelihood calibration and posterior collapse. Score matching and Stein: Fisher divergence, kernel Stein discrepancies; diffusion models as score-based SDEs with likelihood estimation. Optimal transport with entropic regularization: Kantorovich duality, Sinkhorn, Schrodinger bridges; OT vs f-divergence objectives. Distributed and federated learning under communication limits: quantization, gradient coding, lower bounds via information. Privacy and leakage: differential privacy, Renyi DP, moments accountant; accuracy-privacy trade-offs and inference risks. Active learning and Bayesian experimental design: expected information gain, submodularity, scalable estimators.

Full Product Details

Author:   Yehuda Setnik
Publisher:   Independently Published
Imprint:   Independently Published
Dimensions:   Width: 21.60cm , Height: 2.00cm , Length: 27.90cm
Weight:   0.866kg
ISBN:  

9798273722620


Pages:   374
Publication Date:   09 November 2025
Audience:   General/trade ,  General
Format:   Paperback
Publisher's Status:   Active
Availability:   Available To Order   Availability explained
We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately.

Table of Contents

Reviews

Author Information

Tab Content 6

Author Website:  

Countries Available

All regions
Latest Reading Guide

NOV RG 20252

 

Shopping Cart
Your cart is empty
Shopping cart
Mailing List