4What is optimization?

In this chapter we will denote the set of column vectors with rows by . The arithmetic of matrices apply i.e., we may add vectors in and multiply them by a number in .
In the next chapter we will introduce them as euclidean vector spaces. The term euclidean refers to a norm: a function measuring the size of a vector. In this chapter we only need the structure as column vectors.

4.1 What is an optimization problem?

An optimization problem consists of maximizing or minimizing a function subject to constraints.
Below are two classical examples related to minimizing (non-linear) functions subject to (non-linear) constraints. These are actually examples of convex optimization problems. More about that later.
A cylindrical can is supposed to have a volume of . The material used for the top and bottom costs DKK per and the material used for the side costs DKK per . Give the dimensions and of the can minimizing the price of the materials.
The cost of the top and bottom pieces are . The cost of the side material is . The constraint is that the volume must be . This is expressed in the equation . All in all the optimization problem is
where and are constants.
Can you see a way of solving this optimization problem by eliminating in the constraint ?
Hint
and can be inserted in . Why is this helpful?
A person is in distress meters from the beach. The life guard spots the situation, but is meters from where he would naturally jump in the water as indicated below. The life guard runs m/s on the beach and swims m/s in the water. How far () should he run along the beach before jumping into the water in order to minimize the time needed to reach the person in distress?
The time spent moving with a speed of over a distance of is
If the life guard jumps in the water at the point he will have to swim a distance of
using the Pythagorean theorem. Therefore the optimization problem becomes
Strictly speaking we do not need the constraint , as the life guard is free to run in the other direction. So the optimization problem is simply to minimize
with no strings attached i.e., is just assumed to be any number.
You need to build a rectangular fence in front of your house for a herb garden. Your house will make up one side of the rectangle, so you only need to build three sides. Suppose you have 10 m of wire. What is the maximum area of the herb garden you can wall in?

4.2 General definition

An optimization problem consists of a subset and a function . We will consider optimization problems in the context of minimization. Optimize in this situation means minimize.
In our most general setting an optimization problem looks like
where and are subsets of with and is a function. A solution to the optimization problem is a vector , such that
for every . Here is called an optimum and is called the optimal value.
We will often write the optimization problem defined above in short form as
The complexity of the problem depends very much on the nature of and . Also, we cannot even be certain that an optimization problem has a solution. Consider the problem
Here can be made arbitrarily small subject to the constraint and the problem has no solution.
We have deliberately not included maximization problems in Definition 4.5. This is because a maximization problem, such as
can be formulated as the minimization problem
Again, we will use the short notation
for the maximization problem in (4.1). A solution to (4.1) is a vector , such that
for every . Again, is called an optimum and the optimal value.
Suppose that the maximization problem
is formulated as the minimization problem
Show that is the optimal value and the optimum for (4.3) if is the optimal value and the optimum for (4.4).
Suppose that . Solve the optimization problem

4.3 Convex optimization

Particularly well behaved optimization problems are the convex ones. These are optimization problems, where is a convex subset and a convex function in Definition 4.5. To define these concepts we first introduce the notion of a line in .
A line is a subset of the form
where with .
A line in the plane is (usually) given by its equation
This means that it consists of points satisfying . Here can be interpreted as the slope of the line and the intersection with the -axis.
What about all the points with ? Certainly they also deserve to be called a line. However, they do not satisfy an equation like (4.5). Informally, this line has infinite slope.
Therefore we introduce the parametric representation of a line: a line is the set of points of the form
where ,
is any point on the line and
is a non-zero (directional) vector.
Example of a line in with (directional) vector through the point .
Given two distinct points
there is one and only one line passing through them. This line is given by
How do we convert the line in (4.5) to the parametric form (4.6)? Well, we know that the two distinct points
are on the line. Therefore it is given by
by (4.7).
Mentimeter
Line directional vector
Compute the parametric representation of the line through the points and . Also compute and in the representation for .
What is the parametric representation of the line consisting of the points with ?
Show in Definition 4.9 that if is given by and , then you might as well replace by , where is a real number and . It gives the same line.
Show that there is a unique line passing through two distinct points and that it is given by and in Definition 4.9.
Do the points
lie on the same line in ?
Show that the line through two distinct points is equal to the subset
A convex subset is a subset that contains the line segment between any two of its points i.e.,
for every number with .
Example of non-convex subset of .
Which of the subsets below are convex?
The points on a line in .
A closed interval in is a subset of the form
for . Prove that is a convex subset of .
Hint
Keep cool and just apply the definitions! First of all, if and only if
Now pick any . We must show that if and , then
You may also write this out as
implies that
Hint
implies that
What is ?
Let and be convex subsets of . Prove that is a convex subset of . Generalize this to show that if are any number of convex subsets of , then their intersection
is a convex subset of . Is the union of two convex subsets necessarily convex?
A convex function is a function defined on a convex subset , such that
for every number with .
Graph of convex function. The line segment between and lies above the graph.
  1. Let the function be given by , where . Show that is a convex function.
    Hint
    Try the case first.
  2. Can you at this point prove that is a convex function?
    Hint
    Simplify
    to an expression that has to be non-negative.
    Hint
  3. Using that is a convex function, prove that is a convex function.
    Hint
    Use that and if (here we really need , since for example , but is not true) to conclude that
    for .
  4. It is a fact that is not a convex function, but can you explain this using the definition of a convex function?
    Hint
    Try and .
Let be a convex function. Then the subset
is a convex subset of , where .
Suppose that and . Looking at Definition 4.18 we must prove that
By the definition of being convex (Definition 4.23), it follows that
But, since and we have
and therefore
Therefore,
and .
We do not have the tools yet to prove the crucial result about convex optimization problems, but at least we can state it.
In hunting for optimal solutions to an optimization problem one is often stuck with a point , which is optimal locally. This means that for every that is sufficiently close to (we will explain what this means in the next chapter). The remarkable thing that happens in a convex optimization problem is that if is optimal locally, then it is a global optimum! It satisfies not only for close to , but for every .
The optimization problem in Exercise 4.8 is a very typical convex optimization problem.
Below you see a plot of the function (press Compute)
restricted to the interval . You can see that it has a local minimum around and also that this minimum is not a global minimum (certainly is smaller). So is not a convex function on this interval (but if you look at it more locally on the interval it is a convex function.
Solve the optimization problem
for and .

4.4 Linear optimization

We will start this section with a concrete example.
A company produces two products and . The product is selling for USD and is selling for USD. There are certain limited ressources in the production of and . Two raw materials and are needed along with employee work time. The production of requires minutes, one unit of and six units of . The production of requires minutes, one unit of and eight units of . There are minutes of employee work time, units of and units of available. These constraints in the production can be outlined in the diagram below
How many units of and of should the company produce to maximize its profit?
You can rewrite this as the optimization problem
This optimization problem is a special case of linear optimization, which arguably is one of the most succesful applications of mathematics (after the introduction of the simplex algorithm following World War II). We will give a taste of the mathematical setup here.
The simplest convex optimization problems are the linear ones. Recall that a linear function has the form
for . Usually we write this with matrix notation as
where
Show that a linear function is convex.
A linear optimization problem is not about minimizing a linear function over an arbitrary convex subset. We choose the convex subset as an intersection of subsets of the form
where is a non-zero vector and a number i.e., a linear optimization problem has the form
where
and and .
Use a selection of previous exercises to show that the subset defined in (4.9) is a convex subset of .
Using matrix notation we write as
where is the matrix with row vectors and
Here is a concrete example for . The optimization problem
translates into matrix notation with the matrices
In this case it is helpful to draw the optimization problem in the plane . This is done below.
Constraints pictured as shaded area above. Optimum occurs in a vertex (corner).
We will give a general (but rather slow) algorithm below for solving linear optimization problems. In fact it all boils down to solving systems of linear inequalities. Sometimes linear optimization is referred to as linear programming. The basic theory of linear programming was pioneered, among others, by one of the inventors of the modern computer, John von Neumann.
John von Neumann (1903-1957). Picture from LANL.
Sage has much more advanced algorithms built in for solving (integer) linear optimization problems. I have translated the linear optimization problem in Example 4.31 into Sage below.
Mentimeter
Concrete LP

4.5 Fourier-Motzkin elimination

Fourier-Motzkin elimination is a classical method (dating back to 1826) for solving linear inequalities. It is also a key ingredient in an algorithm for solving linear optimization problems.
I am convinced that the best way to explain this method is by way of an extended example. For more formalities you may consult Chapter 1 of my book Undergraduate Convexity.
Consider the linear optimization problem
We might as well write this as
by adding the extra variable . This enables us to reformulate the problem as follows: Find the maximal value of , such that there exists with
where is the set of solutions to the system
of inequalitiesAn equality is logically equivalent to the two inequalities and in the sense that ..
We have the Gauss elimination method for solving systems of linear equations. How do we now solve (4.11), where we also have inequalities?
Well, at first we can actually do a Gauss elimination step by eliminating in the equation i.e., by putting . This is then inserted into the inequalities in (4.11):
and we get the system
of inequalities in the variables and . Now we only have inequalities left and we have to invent a trick for eliminating . Let us isolate on one side of the inequality signs and :
Written a little differently this is the same as
Now the scene is set for elimination of . Listen carefully. First the inequalities in (4.12) can be boiled down to the following two inequalities
by using (repeatedly) that and for three numbers .
Then, finally comes the (Fourier-Motzkin) elimination step: The existence of a solution to (4.13) is equivalent to the single inequality
This single inequality can be exploded or expanded (see Exercise 4.34) into the following inequalities
Similarly to (4.12) we now isolate from the above inequalities:
and find that
Therefore the maximum in the optimization problem (4.10) is . How do we now find numbers satisfying the constraints in the optimization problem (4.10) with ?
This is simply done inserting first in (4.13). Here you get the two inequalities and . Therefore . Since we had from the very beginning we therefore get and we have the unique solution to the optimization problem.
What is the solution if we replace Maximize with Minimize in the optimization problem (4.10)?
Prove the following:
Let be numbers. Then
if and only if the inequalities
are satisfied.
The following is Exercise 1.8 from my book Undergraduate Convexity.
A vitamin pill is produced using two ingredients and . The pill needs to satisfy four constraints for the vital vitamins and . It must contain at least milligrams and at most milligrams of and at least milligrams and at most milligrams of . The ingredient contains milligrams of and milligrams of per gram. The ingredient contains milligrams of and milligrams of per gram:
Let denote the amount of and the amount of (measured in grams) in the production of a vitamin pill. Write down a system of linear inequalities in and describing the constraints above.
We want a vitamin pill of minimal weight satisfying the constraints. How many grams of and should we mix?
Use Fourier-Motzkin elimination to solve this problem.
Check your solution by modifying the input to the Sage code in Example 4.32 using Remark 4.6.
One may also force minimization by inserting the following option
LP = MixedIntegerLinearProgram(maximization=False, solver = "GLPK").
in Example 4.32.

4.6 Application in machine learning and data science

To start with, consider a toy example of a machine learning problem: we wish to tell the gender of a person based on a data point consisting of the height and weight of the person.
To do this we train our model by measuring the height and weight of a lot of people. Each of these measured data points are labeled female or male according to the gender of the person.
Given a new data point, we wish to tell if the person is female or male. Here we consider a very simple model for doing this. First we need to introduce some new mathematical terms. We will introduce the terms generally for data points in and not just in as above.
A hyperplane in is a generalization of a line in the plane. In general a hyperplane is defined as the set of points satisfying
where is a non-zero vector and is a number. A hyperplane divides into two subsets: the points above or on the hyperplane satisfying and the ones below the hyperplane satisfying .
Suppose we are given a data set as a finite set of points in and that each of these points are labeled with either a blue or a red color. We wish to find a hyperplane, such that the blue points are above and the red points are below the hyperplane.
We may then use the hyperplane to predict the label of a point. This could be gender, if you win or lose money buying a stock, anything with a binary classifier.

4.6.1 Formulation as a linear optimization problem

Suppose that the points labeled blue are and the points labeled red are . Then we wish to find and , such that
for and
for . One can show that these strict inequalities may be solved for and if and only if the inequalities
are solvable for and , where and .
It is, however, not realistic to expect data to behave this nicely. Instead one invents the rather ingenious linear optimization problem
for and . This linear optimization problem has optimal value zero if and only if data can be separated strictly. Otherwise, it finds a hyperplane minimizing the mean errors for the points involved.
The linear optimization problem (4.15) may look untied to the real world, but it has been used very successfully in the diagnosis and prognosis of breast cancer. See Mangasarian et al.
In the sage window below we have implemented the solution of the linear optimization problem (4.15), where the output is a graphical illustration of the optimal line, that separates the points in xpts and ypts with the smallest mean error as defined in the function to be minimized in (4.15).