## Tuesday, June 21, 2016

### Bregman Divergence

Learnt a new mathematical concept: Bregman divergence. Knew about Kullback-Leibler divergence but this one I didn't know about. The concept of dual space is very interesting.  the following is from Wiki page.

## Definition

Let $F\mathrm{\Omega }\mathbb{R}$ be a continuously-differentiable real-valued and strictly convex function defined on a closed convex set $\mathrm{\Omega }$.
The Bregman distance associated with F for points $pq\mathrm{\Omega }$ is the difference between the value of F at point p and the value of the first-order Taylor expansion of Faround point q evaluated at point p:
${D}_{F}$ p qF pF qF q pq

## Properties

• Non-negativity${D}_{F}$ p q0 for all p, q. This is a consequence of the convexity of F.
• Convexity:${D}_{F}$ p q is convex in its first argument, but not necessarily in the second argument (see [1])
• Linearity: If we think of the Bregman distance as an operator on the function F, then it is linear with respect to non-negative coefficients. In other words, for ${F}_{1}{F}_{2}$strictly convex and differentiable, and $\lambda 0$,
${D}_{{F}_{1}\lambda {F}_{2}}$ p  q  DF1 p  q  λ DF2 p  q
• Duality: The function F has a convex conjugate ${F}^{}$. The Bregman distance defined with respect to ${F}^{}$ has an interesting relationship to ${D}_{F}$ p q
${D}_{{F}^{}}{p}^{}{q}^{}$ DF q p
Here, ${p}^{}\mathrm{\nabla }Fp$ and ${q}^{}\mathrm{\nabla }Fq$ are the dual points corresponding to p and q.
• Mean as minimizer: A key result about Bregman divergences is that, given a random vector, the mean vector minimizes the expected Bregman divergence from the random vector. This result generalizes the textbook result that the mean of a set minimizes total squared error to elements in the set. This result was proved for the vector case by (Banerjee et al. 2005), and extended to the case of functions/distributions by (Frigyik et al. 2008). This result is important because it further justifies using a mean as a representative of a random set, particularly in Bayesian estimation.

## Examples

• Squared Euclidean distance ${D}_{F}$ x yxy2 is the canonical example of a Bregman distance, generated by the convex function $Fxx{}^{2}$
• The squared Mahalanobis distance${D}_{F}$ x y12 xyTQ xy  which is generated by the convex function $Fx\frac{1}{2}{x}^{T}Qx$. This can be thought of as a generalization of the above squared Euclidean distance.
• The generalized Kullback–Leibler divergence
${D}_{F}$ p  q  p i logpiqi p  i  q i
is generated by the convex function
$F$ p ip i logp i p i
${D}_{F}$  p  q   ipiqilogpiqi1
is generated by the convex function
$F$ p logp i