6Convex functions

In this chapter we will dive deeper into convex functions. The main focus will be on (differentiable) convex functions defined on intervals (convex subsets) of the real numbers i.e. (differentiable) convex functions in just one variable. Along the way, differentiability is formally introduced. I will assume that you are familiar with differentiation in an operational manner.

6.1 Strictly convex functions

Below we strengthen Definition 4.24 of a convex function.
Let be a convex subset. A strictly convex function is a convex function , such that
for every number with and every with .
The strict inequality in (6.1) collapses to an equality if or . For example, if , then the left hand side of (6.1) is and the right hand side is .
Definition 6.1 is illustrated below for a function . Here both and are real numbers (that is, in in Definition 6.1 ). The (red) line segment between and lies strictly () above the (black) graph of :
LLM
💬

Please explain patiently the definition below to me. It seems that it is also valid
for functions defined on vectors in the plane ($n=2$). Give concrete examples of this.
Test me with a few questions in the end.
'''
Let $C \subseteq \mathbb{R}^n$ be a convex subset.
A \emph{strictly convex function} is a convex function $f: C\rightarrow \mathbb{R}$, such that
\begin{equation}
f((1 - t) u + t v) < (1-t) f(u) + t f(v)
\end{equation}
for every number $t$ with $0< t < 1$ and every $u, v\in C$ with $u\neq v$. 

| ChatGPT | Claude | Copilot | Gemini | HuggingChat | Mistral |
Consider the line (function) given by
for . This function is convex, since we can formally write for every :
However, the computation in (6.2) also shows why there is no chance that is strictly convex. Intuitively, the graph of convex functions need to bend and curve a bit to be strictly convex. No lines should occur in their graphs.
Let be a convex function. Show is strictly convex if and only if
for implies that .
Give an example of a non-constant convex function , which is not strictly convex. Show in details that is a strictly convex function.
Hint
Look back to the relevant part of Exercise 4.26 for dealing with .

6.2 Why are convex functions interesting?

We begin this section by giving the following result without proof.
A convex function defined on an open convex subset is continuous.
Give an example showing that Theorem 6.7 is not true if the convex function is defined on a closed convex subset.
Hint
Try to come up with an example like . Look at the end point .
Hint
Well, try out
Let us now define precisely what is meant by a local vs a global minimum for a function.
Let be a function, where is an arbitrary subset (not necessarily convex, open or closed). Then is called a local minimum for if
for every , which is sufficiently close to . Being sufficiently close to means that satisfies
for some fixed .
In a much stronger notion, is called a global minimum if
for every (not just locally).
Graph of function defined on an interval. This function has a local minimum, which is not a global minimum.
Give an example of a local minimum that is not a global minimum for a precisely specified function. Also give an example of a global minimum, which is not uniquely defined (again for a precisely specified function). Uniquely defined means that there is precisely one , such that is minimal.
We might as well have talked about maximum instead of minimum above.
Reformulate Definition 6.9 in order to define a local and a global maximum.
A local extremum is a point , which is either a local minimum or a local maximum.
Convex functions are interesting, because of the local nature of the minimization problem
If you run into a local minimum in (6.3) , then you are sure that it also is a global minimum! This is the content of the result below.
Let be a convex function defined on a convex subset . If is a local minimum, then is a global minimum. If is strictly convex, then a global minimum for is unique.
By the definition of local minimum in Definition 6.9 , there exists , such that , when and . Suppose that is not a global minimum. Then there exists with . Consider the point
where . Then
Since , we can choose sufficiently small such that implying , since is a local minimum. This contradicts that for every . Let be strictly convex and let be a global minimum for . If , and , then
for . This would contradict that global minimality of , since for .
The following little result turns out to be very useful and also very intuitive and drawable! It is a key component in characterizing convex differentiable functions in terms of . We will not give the proof here.
Let be a convex function. Then
for .
The result in Lemma 6.14 is depicted above. A formal proof can be given from first principles only using Definition 4.24 .

6.3 Differentiable functions

To appreciate the depth of the notion of differentiability, you should read the story (joke, actually) in the second paragraph of section 8-2 in volume I of the famous Feynman Lectures on Physics. Below is a photograph of the master explainer in action.

6.3.1 Definition

Let be a function defined on the open interval . The notion of being differentiable at a point can be glanced from the drawing below
where we informally let approach and look at the limiting value of the slope. Newton used to say many hundred years ago, that the derivative of at is the value of this slope just before becomes . In modern day mathematical parlance, this translates into the existence of (a slope) , such that
We will use the equivalent operational definition below in terms of continuous functions defined around with . This looks difficult, but it is actually a clever way of approaching differentiability (and perhaps more in the spirit of Newton).
The function is differentiable at if there exists
  1. with i.e., and .
  2. A function continuous at with ,
such that
for every .
The number is denoted and called the derivative of at ; is called differentiable if it is differentiable at every .
LLM
💬

Please explain the definition of differentiability given below. Illustrate
by a few example and quiz me afterwards.
'''
The function $f: (a, b)\rightarrow \RR$ is differentiable at $x_0\in (a, b)$ if there exists
\begin{enumerate}[(i)]
\item
$c\in \RR$
\item
$\delta > 0$ with $x_0 - \delta, x_0 + \delta\in (a, b)$ i.e., $a + \delta < x_0$ and $x_0< b-\delta$.
\item
A function $\epsilon: (-\delta, \delta) \rightarrow 0$ continuous at $0$ with $\epsilon(0) = 0$,
\end{enumerate}
such that
\begin{equation}\label{operational}
f(x_0 + h) - f(x_0) = c h + \epsilon(h) h
\end{equation}
for every $h\in (-\delta, \delta)$.

The number $c$ is denoted $f'(x_0)$ and called \emph{the derivative} of
$f$ at $x_0$; $f$ is called \emph{differentiable} if
it is differentiable at every $x_0\in (a, b)$.  
'''

| ChatGPT | Claude | Copilot | Gemini | HuggingChat | Mistral |
If a function is differentiable, we get a new function giving the (first) derivative at a point as output. We may ask again if this function is differentiable. If this is so, we may define a function given by called the second derivative. This procedure may be continued. We use the notation for the -th derivative.
Let us apply Definition 6.16 to the function at the point . Here
Here you immediately see that with (and ) in Definition 6.16 .
Use Definition 6.16 to formally show that if .
A differentiable function is continuous as is shown in the following result.
If the function is differentiable at , then it is continuous at .
That is continuous at means (recall Definition 5.48 ) that to every , we may find so that
We are assuming that is differentiable at , so according to Definition 6.16 , there exists a number so that (with )
I will not write every detail out here, but you can see from the formula above that for some number , when is sufficiently small. This gives a that can be used in (6.5) .
The ReLu function is an example of a function, which is continuous, but not differentiable at . This is much related to its sharp corner there.
As mentioned in these notes, the ReLu function plays a prominent role as an activation function in neural networks.
Show precisely that the ReLu function is not differentiable at .

6.3.2 Formulas

In operating with differentiable functions you are supposed to draw on your previous knowledge. I have summarized some of this knowledge below (even though we will give hints below as how to prove some of the rules).
  1. If , where , then
    .
  2. If , where , then
  3. If , then
  4. If , then
    Here denotes the logarithm with base .
  5. If , then
  6. If , then
  7. If and are differentiable functions, then the derivative of their product is
  8. If and are differentiable functions, then the derivative of their quotient is
  9. If and are composable differentiable functions, then the derivative of their composite is
Suppose that . What is

6.3.3 The derivative of a product

From high school you know that the derivative of a product of two functions and is given by the formula
We can use the -definition (6.4) to derive the product rule in (6.6) . The computation below is a bit cumbersome, but actually quite doable. We assume to begin with that and are differentiable at according to (6.4) i.e.,
Then we start the computation:
where the function
is seen to be continuous at with . The end result of this computation shows that is differentiable at with
again according to (6.4) .
Show that the function defined in (6.8) satisfies the relevant conditions in Definition 6.16 .
The formula for the derivative of a fraction i.e.,
can be derived using a neat little trick. This is the topic of the following exercise.
Show how the product rule may be used to derive the rule for finding the derivative of a fraction:
Hint

6.3.4 The one variable chain rule

The formula for the derivative of a composite function is given by
where is in the domain of . Let us see how (6.4) applies in showing this.
Suppose that is differentiable at and is differentiable at , then we can mess around a bit with the -functions for and for the composite function around :
where (take a deep breath)
Here is seen to be continuous at with i.e., the composition is differentiable at with derivative
The formula (6.10) is extremely important and useful. We give some applications in the exercises below.
For the function for , you already know that . Show that if you define the function by
for an arbitrary number , then .
Compute the derivative of the function given by
using only paper and pencil! You can check your result afterwards using a computer.
Suppose that and are inverse functions i.e.,
If you know the derivative of , how can you use the chain rule to get the derivative of ? Illustrate with examples like and , and .
Suppose that is a convex function. We know that is continuous, but is differentiable at every point ?
Hint
Nope. This is wrong. Come up with a convex function and a point , such that is not differentiable at .

6.3.5 The Newton-Raphson method for finding roots

We begin this section with a surprising example.
Suppose that and we wish to compute . To do this we may focus on the quadratic equation and attempt to compute an approximate value , such that is close to . Let me at this point disclose that there is a very effective iterative scheme for doing this. You start by putting and then iterate using the formula
to get better and better approximations to .

The formula in (6.11) is derived from
where .
You can try out (6.11) below.
I have been in complete awe of the Newton-Raphson method since my early youth. It is an algorithm, where the notion of differentiability really shines.
The method comes from Definition 6.16 with : we are assuming that is very close to , where . Then
Ignoring the very small number and solving this equation for we get
In the Sage window below, I have entered the algorithm starting in running ten iterations for finding a zero for .
Graph
Give an example, where the Newton-Raphson method cycles between points and never finds the desired zero. Perhaps a drawing will help here.
The Newton-Raphson converges rapidly in most cases. Of course, it breaks down violently if it runs into a critical point i.e., a point , such that .
Below is some interactive Sage code for experimenting with Newton's method.
The formula (see button in Example 1.83 ) for the (monthly) payment on a (car) loan over payments with a down payment of and an interest rate of (per payment or term) is given by the formula
There is no explicit formula for calculating given and . Here the Newton-Raphson method is invaluable for estimating by approximating a zero for the function
Your bank promises you a loan of DKK with yearly payments of DKK over years. At the same time it claims that its interest rate is very favorable at only %. Here the bank is wrong! What is the real interest rate? How much money do you save (compared to the original offer from the bank) if you insist that the bank offers you the promised interest rate of %?

6.3.6 Critical points and extrema

A critical point for a differentiable function is a point with
The crucial result here is the following. It seems to date back to Fermat (see Fermat's theorem).
Let be a differentiable function. If is a local extremum for , then is critical point i.e., .
Suppose that is a local maximum and that
according to (6.4) . If , then we can choose sufficiently small, such that if , since and is continuous in . Therefore
contradicting that is a local maximum. The proof is similar for and if is a local minimum.
Is the converse of the above lemma true i.e., if is a local extremum?
Theorem 6.37 below is called the mean value theorem. It is a consequence of Lemma 6.35 and the extremely important Theorem 5.66 about continuous functions on compact subsets attaining their maxima and minima!
Let be continuous and differentiable on . Then there exists such that

6.3.7 Increasing functions

The definition below is much simpler than the definition of differentiability.
A function with is called increasing if
and strictly increasing if
for .
LLM
💬

Explain the definition below to me. Give some examples and test me.
\begin{definition}
A function $f:S\rightarrow \mathbb{R}$ with $S\subseteq \mathbb{R}$ is called
  \emph{increasing} if
  \begin{equation*}
    x\leq y\Rightarrow f(x) \leq f(y)
  \end{equation*}
  and \emph{strictly increasing} if 
  \begin{equation*}
    x< y\Rightarrow f(x) < f(y)
  \end{equation*}
  for $x, y\in S$.
\end{definition}

| ChatGPT | Claude | Copilot | Gemini | HuggingChat | Mistral |
Give an example of an increasing function. Give an example of an increasing function that is not strictly increasing.
The following very important result is a consequence of Theorem 6.37 . You probably already know this result from your previous (danish) education (monotoniforhold!).
Let be a differentiable function. Then is increasing if and only if for every . If for every , then is strictly increasing.
Which of the properties below are true for the function given by
  1. It is differentiable.
  2. It is continuous.
  3. It has a global minimum.
  4. It has a global maximum.
  5. It has exactly one critical point.
  6. It has a local maximum.
  7. It has a local minimum.
  8. It is increasing.
  9. It has three zeros.
  10. Its derivative has two zeros.
  11. It is convex.
Show that is strictly increasing i.e.,
Hint
but why is always except when ?
Suppose that is a continuous function, such that is differentiable on the open interval . Is increasing on if for every ?
Is it possible for a strictly increasing function to be bounded i.e., does there exist a (positive) number , such that for every ?
Hint
Have a look at

6.4 Taylor polynomials

If is a critical point for we cannot conclude that is a local extremum. We know that and we can get more information out of by exploring the signs of
Suppose that
is a polynomial, then
For nice functions like we can play this game ad infinitum. In fact in this way we get the beautiful infinite series
If is an times differentiable function defined at , we call the polynomial in (6.12) the Taylor polynomial about the point of degree associated with the . Similarly, one may also define the Taylor polynomial of order about a point by
Taylor polynomials can be used to approximate more complicated functions such as and with a well defined error term. This is cool classical mathematics. Unfortunately we do not have time to go deeper into Taylor's theorem, which states this in precise terms.
Compute the Taylor polynomial for up to degree .
Suppose you have a number that satisfies
Can you make sense of the formula
using Taylor polynomials?
In the context of optimization, the following result becomes important. We will not give the proof, but only notice that Theorem 6.37 also here plays an important role.
Let be a critical point of an times differentiable function , such that is a continuous function,
and . If is even, then is a local minimum if and a local maximum if . If is odd, then is not a local extremum.
Let us apply Theorem 6.47 to the function
where . Here and
is a critical point (why?). Since
we see that is a local minimum if and a local maximum if .
Have you seen Example 6.48 elsewhere, perhaps in a more geometric setting? What type of curve is the graph of ? Here you may consult your previous mathematical knowledge.
What is the outcome, when you apply Theorem 6.47 to the function at ?
Show that is a critical point of the function defined by
Use Theorem 6.47 in deciding if it is a local maximum or minimum or neither.

6.5 Differentiable convex functions

The following theorem is proved using Lemma 6.14 and Theorem 6.37 . It immediately implies Corollary 6.52 , which is the result mostly used.
Let be a differentiable function. Then is convex if and only if is increasing. If is strictly increasing, then is strictly convex.
Theorem 6.51 leads to the following all important result.
Let be a twice differentiable function. Then is convex if and only if for every . If for every , then is strictly convex.
Wait! Stop! Why did I not write if and only if is strictly convex?
Which of the properties below are true for the function ?
  1. It is convex on .
  2. It is strictly convex on .
  3. It is strictly convex on .
  4. It is convex on .
  5. Since , it must have a local minimum for .
You cannot deduce from Corollary 6.52 that the function given by is a strictly convex function. Why not?
You can deduce from Corollary 6.52 that is a strictly convex function. How can be used to prove that is a strictly convex function?
Show that is a strictly convex function .
Show that is a strictly convex function .
Show that given by
is a strictly convex function.
Another nice application of Lemma 6.14 (and Theorem 6.51 ) is the following.
Let be a differentiable function. Then is convex if and only if
for every .
Suppose that is a differentiable convex function and is a critical point for . What can you say about using Theorem 6.58 ?