9Convex optimization
Below is an example of a convex optimization problem in the plane .
where is the subset of points in satisfying
Sketch the subset in Example
9.1
. Show that Example
9.1
really is a
convex optimization problem and solve it.
Given points
, what is the center and
radius of the smallest circle containing these points?
We can write this optimization problem as
where
Upon rewriting this turns into the optimization problem
where
and .
Prove that
(9.1)
and
(9.2)
both are convex optimization problems.
Explain how
(9.1)
is rewritten into
(9.2)
.
Prove that
(9.1)
and
(9.2)
both are convex optimization problems.
Explain how
(9.1)
is rewritten into
(9.2)
.
Hint
Expand
and put .
9.1 Finding the optimal hyperplane separating data
How do you go from
(9.4)
to
(9.5)
?
Suppose that
Let
Show that and . How can and be applied in
constructing and ?
Consider
What is special about ?
Let be the hyperplane in given by and let .
The point closest to in can be found by solving the optimization problem
Explain why
(9.6)
is a convex optimization problem.
Show how Theorem
7.37
can be used to solve this optimization problem by
first deducing the equations
for the Lagrange multiplier . Notice here that
above really contains equations, whereas
is only one equation in , where
. Solve the equations
(9.7)
for and .
How can we be sure that really is a minimum in
(9.6)
?
Finally show that the distance from to is given by the formula
Let the points be labeled by . Then the optimal hyperplane
separating the points
is given by the optimization problem
for .
Let us explicitly write up the optimization problem In Theorem
9.9
in
a very simple situation: finding the best line separating the points
and . In the notation of
(9.5)
, we have (without the
stars on and )
so that
The points are
where and .
Therefore the optimization problem in Theorem
9.9
becomes
Solve the optimization problem
(9.8)
and verify that the best line
from the optimization problem is the one we expect it to be. Also, check how
WolframAlpha solves this optimization problem.
Hint
You could maybe use Fourier-Motzkin elimination to show that
implies .
Notice that the dual optimization problem is an optimization problem in with , where is the number of data points.
This can be in stark contrast to the original optimization problem in Theorem
9.9
, which is an optimization problem in , where is the dimension of the data points.
Sometimes the data points are high dimensional and it pays to solve the dual optimization problem.
Let us write down the dual optimization problem for the points in Example
9.10
. Here
so that the dual optimization problem becomes
This reduces to the optimization problem of maximizing subject to ,
which has the solution . Therefore the optimal hyperplane has normal vector
Quadratic optimization problems, such as the one in Theorem
9.9
can in fact be handled by Sage (well, python
in this case). See
CVXOPT for
further information. Note that the code below needs to be
executed as Python code (choose Python in the pull
down). It attempts (in general) to solve the optimization
problem
In the Sage window below the optimization problem
has been entered.
What happens if you remove
RealNumber=float
Integer=int
from the code above?
Take a look at the input format in Example
9.14
. Can you tell which
optimization problem in this chapter is solved below? Also, the code below seems to report some errors after pressing the compute button. Can you make it run smoothly by making a very, very
small change?
9.1.1 Separating by non-linear functions
An even simpler example is given in dimension one.
Consider the points (or numbers)
with respective labels . These cannot be separated by a hyperplane
. Things change dramatically if we use the function to embed the points in
. Using , the points
(9.13)
map to
with the (same) labels . Here they can be separated by the hyperplane
. This means that the original numbers can be separated by the non-linear function
or rather so that and .
9.1.2 Kernel functions
Let be given by
Then
One gleans from
(9.16)
that is a kernel function, since
(9.15)
is
satisfied for given by
9.1.3 The kernel perceptron algorithm
Use the kernel function in Example
9.17
and the kernel perceptron algorithm to
separate
Sketch the points and the separating curve.
9.2 Logarithmic barrier functions
Prove that is a convex function if and are
convex functions.
Hint
Prove and use that if is a decreasing convex function (in one variable) and is a convex function, then
is a convex function, where we assume the composition makes sense.
Let be a point in with
for and . Then
and as . If
(9.18)
has a unique optimum ,
then by using we obtain a sequence as .
9.2.1 Quadratic function with polyhedral constraints
The optimization problem in Theorem
9.9
has the form
(9.20)
, when we put
Here is a matrix, is an matrix and
.
>>> Newton(Q1, c1, A1, b1, [0,0,18], 1)
array([-0.43814179, 1.84835643, 7.69779763])
>>> Newton(Q1, c1, A1, b1, [0,0,18], 0.5)
array([-0.45244596, 1.81135335, 4.99353576])
>>> Newton(Q1, c1, A1, b1, [0,0,18], 0.1)
array([-0.49569549, 1.84020923, 2.91071666])
>>> Newton(Q1, c1, A1, b1, [0,0,18], 0.05)
array([-0.49875614, 1.87002433, 2.63917096])
>>> Newton(Q1, c1, A1, b1, [0,0,18], 0.01)
array([-0.4999099 , 1.93327221, 2.28805064])
>>> Newton(Q1, c1, A1, b1, [0,0,18], 0.005)
array([-0.49996901, 1.95176304, 2.20331123])
>>> Newton(Q1, c1, A1, b1, [0,0,18], 0.001)
array([-0.49999729, 1.97792915, 2.09031159])
>>> Newton(Q1, c1, A1, b1, [0,0,18], 0.0005)
array([-0.49999904, 1.98432672, 2.06370265])
>>> Newton(Q1, c1, A1, b1, [0,0,18], 0.0001)
array([-0.49999991, 1.99295433, 2.0283835 ])
The first two coordinates of the output are the - and -coordinates of the center. The third
is from
(9.2)
.
Try out the code in the Sage window above on the Exercises
7.46
,
7.47
and
7.48
. Check the output of the code by actually solving these exercises.
Compute the best line separating the labeled data
((1, 0), +1), ((2, 0), +1), ((3, 0), +1), ((3, 2), +1), ((1, 1), -1), ((2, 2), -1).
9.3 A geometric optimality criterion
If is an optimal solution and , then
for every with , where denotes the
epsilon function in the definition of differentiability (see
Definition
7.5
). Therefore
for . This is only possible if . We have silently applied the convexity of
and the differentiability of at .
If in addition is convex and
(9.22)
holds, then
Theorem
8.21
shows that is an optimal solution.
Sketch how Proposition
9.26
applies to show that an optimum in a linear programming
problem
in the plane always can be found in a vertex.
Let be a differentiable convex function and
Suppose that for . Prove that is a minimum for defined on
.
Guess the solution to the optimization problem
Show that your guess was correct!
9.4 KKT
The KKT conditions associated with the convex optimization problem in
(9.1)
are
Verify that the KKT conditions of the optimization problem in
(9.1)
are the
ones given in Example
9.31
.
The optimization problem
(9.23)
is called strictly
feasible if there exists with
Give an example of a convex optimization problem as in
(9.23)
, which is not strictly feasible and with an optimal solution that does
not satisfy the KKT conditions. Such an example shows that strict feasibility is necessary in Theorem
9.34
(ⅰ)
.
9.5 Computing with KKT
9.5.1 Strategy
So where exactly in
(9.24)
is the above claim verified?
Why possibilities above?
How do you solve the optimization problem (or decide there is no solution)
if ?
9.5.2 Example
9.6 Optimization exercises
- Show that (9.29) is a convex optimization problem.
- Sketch the set of constraints in and show that cannot be an optimal solution to (9.29) .
- Write up the KKT conditions for (9.29) and explain theoretically (without actually solving them) why they must have a solution.
- Now solve (9.29) . Is the solution unique?
- Show that is strictly convex.
- Let denote the subset of points satisfying Show that is a closed convex subset.
- Solve the optimization problem
-
Why does the optimization problem
- Find all optimal solutions to (9.31) .
- Let , where at least one of is non-zero. Show that an optimal solution to belongs to .
- Use the KKT conditions to solve the minimization problem
- Use the KKT conditions to solve the minimization problem
Solve the optimization problem
- State the KKT conditions for for .
- Suppose now that . For which and does have optimum in ? State the KKT conditions when .
- Show that is a convex function.
- Find . Is this minimum
unique? Is a strictly convex function.
- Apply the KKT-conditions to decide if is an optimal solution to
- Find and
- Show that is not a convex function.
- Show that is a convex function on the open subset and conclude that is convex on .
- Show that is an optimal solution for the optimization problem . Is a unique optimal solution here?
- Show that is a convex function. Is strictly convex?
- Show that is a convex subset of .
- Does there exist an optimal point for the minimization problem with ?
- Does there exist an optimal point for the minimization problem with ?
- Show that is a convex function and solve the minimization
problem .
- Show using the KKT conditions that is not optimal for
(P).
- Find an optimal solution for (P). Is it unique?