Float freely in the world of numbers: Bregman Divergence

Learnt a new mathematical concept: Bregman divergence. Knew about Kullback-Leibler divergence but this one I didn't know about. The concept of dual space is very interesting. the following is from Wiki page.

Definition[edit]

Let

\text{[math]}

be a continuously-differentiable real-valued and strictly convex function defined on a closed convex set

\text{[math]}

The Bregman distance associated with F for points

\text{[math]}

is the difference between the value of F at point p and the value of the first-order Taylor expansion of Faround point q evaluated at point p:

\text{[math]}

Properties[edit]

Non-negativity: $D_{F}$ p q0 $\text{[math]}$ for all p, q. This is a consequence of the convexity of F.
Convexity: $D_{F}$ p q $\text{[math]}$ is convex in its first argument, but not necessarily in the second argument (see ^[1])
Linearity: If we think of the Bregman distance as an operator on the function F, then it is linear with respect to non-negative coefficients. In other words, for $F_{1} F_{2}$ $\text{[math]}$ strictly convex and differentiable, and $λ 0$ $\text{[math]}$ ,

\text{[math]}

Duality: The function F has a convex conjugate $F^{}$ $\text{[math]}$ . The Bregman distance defined with respect to $F^{}$ $\text{[math]}$ has an interesting relationship to $D_{F}$ p q $\text{[math]}$

\text{[math]}

Here,

\text{[math]}

and

\text{[math]}

are the dual points corresponding to p and q.

Mean as minimizer: A key result about Bregman divergences is that, given a random vector, the mean vector minimizes the expected Bregman divergence from the random vector. This result generalizes the textbook result that the mean of a set minimizes total squared error to elements in the set. This result was proved for the vector case by (Banerjee et al. 2005), and extended to the case of functions/distributions by (Frigyik et al. 2008). This result is important because it further justifies using a mean as a representative of a random set, particularly in Bayesian estimation.

Examples[edit]

Squared Euclidean distance $D_{F}$ x yxy2 $\text{[math]}$ is the canonical example of a Bregman distance, generated by the convex function $F x x^{2}$ $\text{[math]}$
The squared Mahalanobis distance, $D_{F}$ x y12 xyTQ xy $\text{[math]}$ which is generated by the convex function $F x \frac{1}{2} x^{T} Q x$ $\text{[math]}$ . This can be thought of as a generalization of the above squared Euclidean distance.
The generalized Kullback–Leibler divergence