Mirror Descent

Problem

\min\limits_{x \in C} f(x)

Mirror Descent

\phi: Legendre function

\phi^*: Legendre Dual with \nabla\phi^* = (\nabla\phi)^{-1}

D_{\phi}(x, y) = phi(x) - \phi(y) - \langle \nabla\phi(y), x - y\rangle: Bregman Divergence

Traditional Algorithm:

x_{k+1} = proj_C^{\phi}(\nabla\phi^*(\nabla\phi(x_k) - \alpha_kg_k)), where  proj_C^{\phi}(y) = \arg\min\limits_{x \in C} D_{\phi}(x, y) and g_k \in \partial f(x_k)

Equivalent Expression:

The Mirror Descent algorithm can be equivalently expressed as:

x_{k+1} = \arg\min_{x \in C}\{f(x_k) + \langle g_k, x - x_k\rangle + \frac{1}{\alpha_k}D_{\phi}(x, x_k)\}

Comparison with projected subgradient descent

projected subgradient descent algorithm:

x_{k+1} = proj_C^{\|\cdot\|_2}(x_k - \alpha_kg_k)

and similarly, it can be equivalently expressed as:

x_{k+1} = \arg\min_{x \in C}\{f(x_k) + \langle g_k, x - x_k\rangle + \frac{1}{2\alpha_k}\|x - x_k\|_2^2\}

So the difference between Mirror Descent and Projected Subgradient Descent is just it replace the regular 2-norm by a more general Bregman divergence function.

Convergence Rate for Mirror Descent

Assumption:

  1. \phi is \mu-strongly convex with respect to some norm \|\|, namely \phi(x) \geq \phi(y) + \langle \nabla \phi(y), x - y\rangle + \frac{\mu}{2}\|x - y\|^2
  2. D_{\phi}(x^*, x_1) \leq R^2 and  \|g_k\|_*^2 \leq R^2
  3. \alpha_k = \frac{R}{G\sqrt{T}}

Then we have \min\limits_{k = 1, \dots, T} f(x_k) - f(x^*) \leq \frac{RG}{\sqrt{\mu T}}

Reference

  1. paper linked to: https://web.iem.technion.ac.il/images/user-files/becka/papers/3.pdf
  2. also this note: http://users.cecs.anu.edu.au/~xzhang/teaching/bregman.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.