4What is optimization?
In this chapter we will denote the set of column vectors with rows by . The arithmetic of matrices apply i.e., we may add vectors in and multiply them by a number in .In the next chapter we will introduce them as euclidean vector spaces. The term euclidean refers to a norm: a function measuring the size of a vector. In this chapter we only need the structure as column vectors.4.1 What is an optimization problem?
An optimization problem consists of maximizing or minimizing a function subject to constraints. Below are two classical examples related to minimizing (non-linear) functions subject to (non-linear) constraints. These are actually examples of convex optimization problems. More about that later.       A cylindrical can is supposed to have a volume of  . The material used
  for the top and bottom costs  DKK per  and the material used for
  the side costs  DKK per . Give the dimensions  and  of the
  can minimizing the price of the materials. The cost of the top and bottom pieces are . The cost of the 
side material is . The constraint is that the volume must be
. This is expressed in the equation . All in all
the optimization problem is
where  and  are constants.
  The cost of the top and bottom pieces are . The cost of the 
side material is . The constraint is that the volume must be
. This is expressed in the equation . All in all
the optimization problem is
where  and  are constants.
  
Can you see a way of solving this optimization problem by eliminating  in
the constraint ?Hint
and  can be inserted in . Why is this helpful?
     
  A person is in distress  meters from the beach. The life guard
  spots the situation, but is  meters from where he would naturally jump
  in the water as indicated below. The life guard runs  m/s on the
  beach and swims  m/s in the water. How far () should he run along the
  beach before jumping into the water in order to minimize the time needed
  to reach the person in distress? The time spent moving with a speed of  over a distance of  is
  
  If the life guard jumps in the water at the point  he will have
  to swim a distance of
  
  using the Pythagorean theorem. Therefore the optimization problem becomes
Strictly speaking we do not need the constraint , as the life guard is free
to run in the other direction. So the optimization problem is simply to minimize
with no strings attached i.e.,  is just assumed to be any number.
The time spent moving with a speed of  over a distance of  is
  
  If the life guard jumps in the water at the point  he will have
  to swim a distance of
  
  using the Pythagorean theorem. Therefore the optimization problem becomes
Strictly speaking we do not need the constraint , as the life guard is free
to run in the other direction. So the optimization problem is simply to minimize
with no strings attached i.e.,  is just assumed to be any number.

  
  You need to build a rectangular fence in front of your house for a herb garden.
  Your house will make up one side of the rectangle, so you only need to build three
  sides. Suppose you have 10 m of wire. What is the maximum area of the herb garden you can
  wall in?
4.2 General definition
An optimization problem consists of a subset and a function . We will consider optimization problems in the context of minimization. Optimize in this situation means minimize.     
  In our most general setting an optimization problem looks like
     where  and  are subsets of  with  and  is a function.
A solution to the optimization problem is a vector , such that
for every . Here  is called an optimum and  is called the optimal value.We will often write the optimization problem defined above in short form as
     
    We have deliberately not included maximization problems in Definition 4.5. This is
because a maximization problem, such as
can be formulated as the minimization problem
Again, we will use the short notation
for the maximization problem in (4.1).
A solution to (4.1) is a vector , such that
for every . Again,  is called an optimum and  the
optimal value.
  
Suppose that the maximization problem 
is formulated as
the minimization problem
Show that  is the optimal value 
and  the optimum for (4.3) if
 is the optimal value 
and  the optimum for (4.4). 
  
    Suppose that . Solve the optimization problem
4.3 Convex optimization
Particularly well behaved optimization problems are the convex ones. These are optimization problems, where is a convex subset and a convex function in Definition 4.5. To define these concepts we first introduce the notion of a line in .     
      A line  is a subset of the form
      
      where  with .
    
     
A line  in the plane  is (usually) given by its equation
This means that it consists of points  satisfying .
Here  can be interpreted as the slope of the line and  the intersection with the
-axis.What about all the points  with ? Certainly they also deserve to be called
a line. However, they do not satisfy an equation like (4.5). Informally, this line
has infinite slope.Therefore we introduce the parametric representation of a line: a line is the
set of points of the form
where ,
is any point on the line and
is a non-zero (directional) vector.     
   
  
    Example of a line in  with (directional) vector  through the point .
   Given two distinct points
there is one and only one line passing through them. This line is given by
How do we convert the line  in (4.5) to the parametric form
(4.6)? Well, we know that the two distinct points
  are on the line. Therefore it is given by
  
by (4.7).
Mentimeter
  
Compute the parametric representation of the line  through the points  and .
Also compute  and  in the representation  for .
  
What is the parametric representation of the
line consisting of the points  with ?
  
    Show in Definition 4.9 that if  is given by  and , then
    you might as well replace  by , where  is a real number and .  It gives
    the same line.
    
     
  Show that there is a unique line passing through two distinct points
     and that it is given by  and  in Definition 4.9.
  
    Do the points
    
    lie on the same line in ?
    
  
    Show that the line through two distinct points  is equal to the subset
    
    
     
    A convex subset  is a subset that contains the
    line segment between any two of its points  i.e.,
    
    for every number  with .
    

    Which of the subsets below are convex?
    
    The points  on a line  in .
    
  
      A closed interval in  is a subset of the form
      
      for . Prove that  is a convex subset of .Hint
      
    Keep cool and just apply the definitions! First of all,  if and only if
Now pick any . We must show that if  and , then
You may also write this out as
implies that
Hint
implies that 
What is ?
  
    Let  and  be convex subsets of . Prove that  is a
    convex subset of . Generalize this to show that if 
    are any number of convex subsets of , then their intersection
    
    is a convex subset of . Is the union of two convex subsets necessarily convex?
    
     
    A convex function is a function 
    defined on a convex subset , such that
    
    for every number  with .
    

- 
    Let the function  be given by
    , where . Show that  is a convex
    function.HintTry the case first.
- 
    Can you at this point prove that  is a convex function?HintSimplify to an expression that has to be non-negative.Hint
- 
  Using that  is a convex function, prove that  is a convex function. HintUse that and if (here we really need , since for example , but is not true) to conclude that for .
- 
    It is a fact that  is not a convex function, but can
    you explain this using the definition of a convex function?HintTry and .
     
    Let  be a convex function. Then the subset
    
    is a convex subset of , where .
    
    Suppose that  and . Looking at Definition 4.18
we must prove that
By the definition of  being convex (Definition 4.23), it follows that
 
But, since  and  we have
and therefore
Therefore, 
and .
We do not have the tools yet to prove the crucial
    result about convex optimization problems, but at least we can state it.     
    In hunting for optimal solutions to an optimization problem one is often stuck with a
    point , which is optimal locally. This means that  for
    every  that is sufficiently close to  (we will explain what this means in the next
    chapter). The remarkable thing that happens in a convex optimization problem is
    that if  is optimal locally, then it is a global optimum! It satisfies
     not only for  close to , but for every .
  
    Solve the optimization problem
    
    for  and .
    
4.4 Linear optimization
We will start this section with a concrete example.A company produces two products and . The product is selling for USD and is selling for USD. There are certain limited ressources in the production of and . Two raw materials and are needed along with employee work time. The production of requires minutes, one unit of and six units of . The production of requires minutes, one unit of and eight units of . There are minutes of employee work time, units of and units of available. These constraints in the production can be outlined in the diagram below How many units of and of should the company produce to maximize its profit?You can rewrite this as the optimization problemThis optimization problem is a special case of linear optimization, which arguably is one of the most succesful applications of mathematics (after the introduction of the simplex algorithm following World War II). We will give a taste of the mathematical setup here.The simplest convex optimization problems are the linear ones. Recall that a linear function has the form for . Usually we write this with matrix notation as where  
  Show that a linear function is convex.
  
A linear optimization problem is not about minimizing a linear function over
  an arbitrary convex subset. We choose the convex subset as an intersection of
  subsets of the form
  
  where  is a non-zero vector and  a number i.e.,
  a linear optimization problem has the form
  
  where
  
  and  and .  
  Use a selection of previous exercises to show that
  the subset  defined in (4.9) is a convex subset of .
  
Using matrix notation we write  as
  
  where  is the  matrix with row vectors  and
       Here is a concrete example for . The optimization problem
  
  translates into matrix notation with the matrices
  
  In this case it is helpful to draw the optimization problem in the plane . This is done below. 
  
    Constraints pictured as shaded area above. Optimum occurs in a vertex (corner).
     
We will give a general (but rather slow) algorithm below for solving
  linear optimization problems. In fact it all boils down to solving
  systems of linear inequalities. Sometimes linear optimization is
  referred to as linear
    programming. The
  basic theory of linear programming was pioneered, among others, by
  one of the inventors of the modern computer, John von Neumann.

4.5 Fourier-Motzkin elimination
Fourier-Motzkin elimination is a classical method (dating back to 1826) for solving linear inequalities. It is also a key ingredient in an algorithm for solving linear optimization problems.I am convinced that the best way to explain this method is by way of an extended example. For more formalities you may consult Chapter 1 of my book Undergraduate Convexity.Consider the linear optimization problemWe might as well write this asby adding the extra variable . This enables us to reformulate the problem as follows: Find the maximal value of , such that there exists with where is the set of solutions to the systemof inequalitiesAn equality is logically equivalent to the two inequalities and in the sense that ..We have the Gauss elimination method for solving systems of linear equations. How do we now solve (4.11), where we also have inequalities?Well, at first we can actually do a Gauss elimination step by eliminating in the equation i.e., by putting . This is then inserted into the inequalities in (4.11):and we get the system of inequalities in the variables and . Now we only have inequalities left and we have to invent a trick for eliminating . Let us isolate on one side of the inequality signs and :Written a little differently this is the same asNow the scene is set for elimination of . Listen carefully. First the inequalities in (4.12) can be boiled down to the following two inequalities by using (repeatedly) that and for three numbers .Then, finally comes the (Fourier-Motzkin) elimination step: The existence of a solution to (4.13) is equivalent to the single inequalityThis single inequality can be exploded or expanded (see Exercise 4.34) into the following inequalitiesSimilarly to (4.12) we now isolate from the above inequalities:and find thatTherefore the maximum in the optimization problem (4.10) is . How do we now find numbers satisfying the constraints in the optimization problem (4.10) with ?This is simply done inserting first in (4.13). Here you get the two inequalities and . Therefore . Since we had from the very beginning we therefore get and we have the unique solution to the optimization problem.  
What is the solution if we replace Maximize with Minimize in the optimization problem (4.10)?
  Prove the following:Let  be  numbers. Then
    
    if and only if the  inequalities
    
    are satisfied.
  The following is Exercise 1.8 from my book Undergraduate Convexity.A vitamin pill  is produced using two ingredients 
  and . The pill needs to satisfy four constraints for the vital
  vitamins  and . It must contain at least  milligrams and
  at most  milligrams of  and at least  milligrams and at
  most  milligrams of . The ingredient  contains 
  milligrams of  and  milligrams of  per gram.  The
  ingredient  contains  milligrams of  and  milligrams
  of  per gram:Let  denote the amount of  and  the amount of 
  (measured in grams) in the production of a vitamin pill. Write down
  a system of linear inequalities in  and  describing the
  constraints above.We want a vitamin pill of minimal weight satisfying the
  constraints. How many grams of  and  should we mix?Use Fourier-Motzkin elimination to solve this
  problem.Check your solution by modifying the input to the Sage code in Example 4.32 using Remark 4.6.One may also force minimization by inserting the following option
LP = MixedIntegerLinearProgram(maximization=False, solver = "GLPK").in Example 4.32.
4.6 Application in machine learning and data science
To start with, consider a toy example of a machine learning problem: we wish to tell the gender of a person based on a data point consisting of the height and weight of the person.To do this we train our model by measuring the height and weight of a lot of people. Each of these measured data points are labeled female or male according to the gender of the person.
