Problem

$\min\limits_{x \in C} f(x)$

Mirror Descent

$\phi$ : Legendre function

$\phi^*$ : Legendre Dual with $\nabla\phi^* = (\nabla\phi)^{-1}$

$D_{\phi}(x, y) = phi(x) - \phi(y) - \langle \nabla\phi(y), x - y\rangle$ : Bregman Divergence

Traditional Algorithm:

$x_{k+1} = proj_C^{\phi}(\nabla\phi^*(\nabla\phi(x_k) - \alpha_kg_k))$ , where $proj_C^{\phi}(y) = \arg\min\limits_{x \in C} D_{\phi}(x, y)$ and $g_k \in \partial f(x_k)$

Equivalent Expression:

The Mirror Descent algorithm can be equivalently expressed as:

$x_{k+1} = \arg\min_{x \in C}\{f(x_k) + \langle g_k, x - x_k\rangle + \frac{1}{\alpha_k}D_{\phi}(x, x_k)\}$

Comparison with projected subgradient descent

projected subgradient descent algorithm:

$x_{k+1} = proj_C^{\|\cdot\|_2}(x_k - \alpha_kg_k)$

and similarly, it can be equivalently expressed as:

$x_{k+1} = \arg\min_{x \in C}\{f(x_k) + \langle g_k, x - x_k\rangle + \frac{1}{2\alpha_k}\|x - x_k\|_2^2\}$

So the difference between Mirror Descent and Projected Subgradient Descent is just it replace the regular 2-norm by a more general Bregman divergence function.

Convergence Rate for Mirror Descent

Assumption:

$\phi$ is $\mu$ -strongly convex with respect to some norm $\|\|$ , namely $\phi(x) \geq \phi(y) + \langle \nabla \phi(y), x - y\rangle + \frac{\mu}{2}\|x - y\|^2$
$D_{\phi}(x^*, x_1) \leq R^2$ and $\|g_k\|_*^2 \leq R^2$
$\alpha_k = \frac{R}{G\sqrt{T}}$