@@ -403,7 +405,194 @@ Let $f\colon \mathbb{R}^n \to \mathbb{R}^m$ be a function and $K$ be a proper co
Without further discussion, we state that many results for convex functions extend to $K$-convexity.
\subsection{Convex Optimization}
We consider general optimization problems ins \emph{standard form}:
\begin{align*}
\text{minimize }& f_0(x)\\
\text{subject to }& f_i(x) \leq 0 & i \in\{1, \ldots, m\}\\
& h_i(x) = 0 & i \in\{1, \ldots, p\}.
\end{align*}
A feasible point $x$ is \emph{(globally) optimal} if $f_0(x)=\inf\{f_0(y) : y \text{ feasible}\}$. It is \emph{locally optimal} if there exists $R > 0$ such that $x$ is optimal in feasible ball $\norm{z-x}\leq R$. The \emph{feasibility problem} can be stated in standard form:
\begin{align*}
\text{minimize }& 0\\
\text{subject to }& f_i(x) \leq 0 & i \in\{1, \ldots, m\}\\
& h_i(x) = 0 & i \in\{1, \ldots, p\}.
\end{align*}
An optimization problem is called \emph{convex} if $f_i$ is convex for all $i \in\{0, \ldots, m\}$ and $h_i$ is affine for all $i \in\{0, \ldots, p\}$.
\begin{theorem}
Any locally optimal point of a convex optimization problem is globally optimal.
%TODO proof
\end{theorem}
A point $x$ is optimal iff
\begin{enumerate}
\item$x$ is feasible and
\item$\nabla f_0(x)^T (y-x)\geq0$ for all feasible $y$.
\end{enumerate}
\begin{figure}
%TODO iso contours and gradient directions
\end{figure}
In order to simplify convex problems, we can use some common transformations:
\begin{itemize}
\item Eliminating equality constraints ($Ax = b \iff x = Fz + x_0$ then minimize over $z$).
\item Introducing equality constraints.
\item Introducing slack variables for linear inequalities $a_i^T x + s_i = b_i, s_i \geq0$.
Solving the Lagrange dual problem yields to the best lower bound of $p^\ast$, which we denote $d^\ast$. Note that the dual problem is always convex even if the primal problem is not. The Lagrange dual variables $\lambda, \nu$ are dual feasible if $\lambda\succeq0$ and $\lambda, \nu\in\dom g$.
\begin{theorem}[Weak Duality]
$d^\ast\leq p^\ast$.
%TODO proof
\end{theorem}
Weak duality always holds for convex and non-convex problems and can be used to find non-trivial lower bounds for difficult problems. We call the difference between primal and dual solution $p^\ast- d^\ast$\emph{(optimal) duality gap}.
The following theorem does actually not hold in general but it does usually hold for convex problems.
\begin{theorem}[Strong Duality]
$d^\ast= p^\ast$.
\end{theorem}
To make the ''usually`` in the previous statement more precise, consider \emph{Slater's constraint qualification}: Strong duality holds for convex problems in standard form if it is strictly feasible, i.e. there exists $x \in\inter D$ such that $f_i(x) < 0$ for all $i \in\{1, \ldots, m\}$ and $Ax = b$.
Another strong connection between primal and dual problem is due to the following theorem.
\begin{theorem}[Complementary Slackness]
Assume strong duality holds and $x^\ast, \lambda^\ast, \nu^\ast$ are optimal for the primal, respectively, dual problem. Then $\lambda^\ast_i f_i(x^\ast)=0$ for all $i \in\{1, \ldots, m\}$.
Strong duality and the lack of a gradient descent direction that is feasible in optimal points yields to the next theorem, which generalizes the first-order optimality condition (not to be confused with the first-order convexity conition from the previous section) $f'(x)=0$.
\begin{theorem}[Karush, Kuhn, Tucker]
Assume strong duality holds and $x, \lambda, \nu$ are optimal, then
\begin{enumerate}
\item primal feasibility, i.e. $f_i(x)\leq0$ for $i \in\{1, \ldots, m\}$, $h_i(x)=0$ for $i \in\{1, \ldots, p\}$ holds,
\item dual feasibility, i.e. $\lambda_i \geq0$ for $i \in\{1, \ldots, m\}$ holds,
\item complementary slackness, i.e. $\lambda_i f_i(x)=0$ for $i \in\{1, \ldots, m\}$ holds, and
\item gradient descent of Lagrangian w.r.t. $x$ vanishes, i.e.
We can even solve the primal problem via the dual: Assume we have optimal duals $(\lambda^\ast, \nu^\ast)$. Then it suffices to minimize $f_0(x)+\sum_{i=1}^m \lambda_i^\ast f_i(x)+\sum_{i=1}^p \nu_i^\ast h_i(x)$.
\begin{lemma}
Let $x$ be primal feasible and $(\lambda, \nu)$ dual feasible. Then
$$
f_0(x)- p^\ast\leq f_0(x)- g(\lambda, \nu).
$$
%TODO proof
\end{lemma}
Thus, $p^\ast\in[g(\lambda, \nu), f_0(x)]$ and $d^\ast\in[g(\lambda, \nu), f_0(x)]$, i.e. a duality gap of $0$ is a certificate for optimality.
\section{Algorithms}
For unconstrained problems, we can simply use the optimality condition $\nabla f_0(x^\ast)=0$ as a starting point. We shortly describe a few algorithms that compute a solution numerically. Exact solving is in general not possible.
The main idea of our algorithms is very generic and can be described in pseudocode with a few lines.
\begin{algorithm}
\While{stopping criterion not met}{
Determine descent direction $\Delta x$.\\
Line search: choose step size $t$.\\
$x \coloneqq x + t \Delta x$.
}
\end{algorithm}
This generic approach works for convex problems: From $\nabla f(x^{(k)})(y-x^{(k)})\geq0$, we can derive $f(y)\geq f(x^{(k)})$. Hence, the search direction must satisfy $\nabla f(x^{(k)})^T \Delta x^{(k)} < 0$.
For the line search there exists two general ideas.
\begin{description}
\item[exact line search]$t \coloneqq\arg\min_{t>0} f(x+t\Delta x)$ via binary search
\item[backtracking line search] with parameters $\alpha\in\left]0, \frac{1}{2}\right[, \beta\in\left]0, 1\right[$. Backtrack until $t < t_0$.
\vspace{-2em}
\begin{algorithm}
$t \coloneqq1$.\\
\While{$f(x+t\Delta x)\geq f(x)+\alpha t \nabla f(x)^T \Delta x$}{
$t \coloneqq\beta t$.
}
\end{algorithm}
\vspace{-2em}
\end{description}
\begin{figure}
%TODO line searches
\end{figure}
To determine the descent direction, the obvious start would be the gradient descent method, that is choose $\Delta x =-\nabla f(x)$.