Men who made mathematics

Men who made mathematics

Tuesday, June 21, 2016

Bregman Divergence

Learnt a new mathematical concept: Bregman divergence. Knew about Kullback-Leibler divergence but this one I didn't know about. The concept of dual space is very interesting.  the following is from Wiki page.


Let FΩR be a continuously-differentiable real-valued and strictly convex function defined on a closed convex set Ω.
The Bregman distance associated with F for points pqΩ is the difference between the value of F at point p and the value of the first-order Taylor expansion of Faround point q evaluated at point p:
DF p qF pF qF q pq 


  • Non-negativityDF p q0 for all p, q. This is a consequence of the convexity of F.
  • Convexity:DF p q is convex in its first argument, but not necessarily in the second argument (see [1])
  • Linearity: If we think of the Bregman distance as an operator on the function F, then it is linear with respect to non-negative coefficients. In other words, for F1F2strictly convex and differentiable, and λ0,
DF1λF2 p  q  DF1 p  q  λ DF2 p  q 
  • Duality: The function F has a convex conjugate F. The Bregman distance defined with respect to F has an interesting relationship to DF p q
DFpq DF q p
Here, pFp and qFq are the dual points corresponding to p and q.
  • Mean as minimizer: A key result about Bregman divergences is that, given a random vector, the mean vector minimizes the expected Bregman divergence from the random vector. This result generalizes the textbook result that the mean of a set minimizes total squared error to elements in the set. This result was proved for the vector case by (Banerjee et al. 2005), and extended to the case of functions/distributions by (Frigyik et al. 2008). This result is important because it further justifies using a mean as a representative of a random set, particularly in Bayesian estimation.


  • Squared Euclidean distance DF x yxy2 is the canonical example of a Bregman distance, generated by the convex function Fxx2
  • The squared Mahalanobis distanceDF x y12 xyTQ xy  which is generated by the convex function Fx12xTQx. This can be thought of as a generalization of the above squared Euclidean distance.
  • The generalized Kullback–Leibler divergence
DF p  q  p i logpiqi p  i  q i 
is generated by the convex function
F p ip i logp i p i 
DF  p  q   ipiqilogpiqi1
is generated by the convex function
F p logp i 

No comments:

Post a Comment